Driver Policies¶

Driver policies control how the application interacts with the Cassandra cluster during normal operation and failure scenarios. These policies are the primary mechanism through which developers configure failure handling behavior.

Developer Responsibility for Failure Handling¶

Unlike traditional databases where failure handling is largely abstracted away, Cassandra drivers expose failure scenarios directly to the application. The developer is responsible for configuring appropriate responses to failures.

This design is intentional: Cassandra's distributed architecture means that "failure" is nuanced. A node being slow is different from a node being down. A write timeout does not mean the write failed—it may have succeeded on some replicas. The driver cannot make assumptions about what the application considers acceptable behavior.

Failure Type	What Happened	Driver's Question	Developer Must Decide
Read timeout	Some replicas didn't respond in time	Retry or fail?	Is stale data acceptable? Retry elsewhere?
Write timeout	Coordinator didn't get enough acknowledgments	Retry or fail?	Is duplicate write acceptable? Is operation idempotent?
Unavailable	Not enough replicas alive to satisfy CL	Retry or fail?	Lower consistency acceptable? Wait and retry?
Node down	Node unreachable	Where to route? When to retry connection?	Failover strategy? Recovery timing?

Default policies exist but are generic. Production applications must evaluate each policy against their specific requirements for consistency, latency, and availability.

Policy Overview¶

Policy	Question It Answers	Default Behavior
Load Balancing	Which node should handle this request?	Round-robin across local datacenter, token-aware
Retry	Should a failed request be retried?	Retry read timeouts once, don't retry write timeouts
Reconnection	How quickly to reconnect after node failure?	Exponential backoff (1s base, 10min max)
Speculative Execution	Should redundant requests be sent?	Disabled

Default Policy Behavior¶

Understanding default behavior is essential before customizing policies.

Java Driver Defaults (v4.x)¶

Policy	Default Implementation	Behavior
Load Balancing	`DefaultLoadBalancingPolicy`	Token-aware, prefers local DC, round-robin within replicas
Retry	`DefaultRetryPolicy`	Retry read timeout if enough replicas responded; never retry write timeout
Reconnection	`ExponentialReconnectionPolicy`	Base: 1 second, Max: 10 minutes
Speculative Execution	None	Disabled—must explicitly enable

Python Driver Defaults¶

Policy	Default Implementation	Behavior
Load Balancing	`TokenAwarePolicy(DCAwareRoundRobinPolicy())`	Token-aware wrapping DC-aware round-robin
Retry	`RetryPolicy`	Retry read timeout once on same host; retry unavailable once on next host
Reconnection	`ExponentialReconnectionPolicy`	Base: 1 second, Max: 600 seconds
Speculative Execution	None	Disabled

Failure Scenarios¶

Understanding common failure scenarios helps in selecting appropriate policies.

Scenario 1: Single Node Failure¶

Policy involvement:

Load Balancing: Provides fallback nodes when primary fails
Retry: Determines if connection failure triggers retry
Reconnection: Schedules background reconnection to Node1

Scenario 2: Read Timeout (Partial Response)¶

Policy involvement:

Retry: Decides whether to retry based on how many replicas responded
Speculative Execution: Could have sent parallel request to avoid timeout

Scenario 3: Write Timeout (Dangerous)¶

Critical consideration: Write may have succeeded on R2 but acknowledgment was lost. Retrying non-idempotent writes risks data corruption.

Scenario 4: Network Partition¶

Policy involvement:

Load Balancing: Must route only to reachable nodes
Reconnection: Attempts to reconnect to partitioned nodes
Retry: Unavailable exceptions if CL cannot be met with reachable nodes

Multi-Datacenter Configuration¶

Multi-DC deployments require careful policy configuration to ensure correct behavior during normal operation and DC failures.

Local Datacenter Configuration¶

Always configure the local datacenter explicitly. This is the most critical setting for multi-DC deployments.

// Java - REQUIRED for multi-DC
CqlSession session = CqlSession.builder()
    .withLocalDatacenter("dc1")
    .build();

# Python - REQUIRED for multi-DC
from cassandra.policies import DCAwareRoundRobinPolicy
cluster = Cluster(
    load_balancing_policy=DCAwareRoundRobinPolicy(local_dc='dc1')
)

Multi-DC Request Routing¶

DC Failover Behavior¶

Configuration	Normal Operation	Local DC Down
`LOCAL_QUORUM` + local DC only	Routes to local DC	All requests fail
`LOCAL_QUORUM` + remote DC allowed	Routes to local DC	Fails over to remote DC
`QUORUM`	May route anywhere	Continues if global quorum available

Multi-DC Policy Configuration¶

// Java - Multi-DC with controlled failover
CqlSession session = CqlSession.builder()
    .withLocalDatacenter("dc1")
    .withLoadBalancingPolicy(
        DefaultLoadBalancingPolicy.builder()
            .withLocalDatacenter("dc1")
            // Permit failover to remote DC for LOCAL consistency levels
            .withDcFailoverMaxNodesPerRemoteDc(2)
            .build())
    .build();

Consistency Level Implications¶

Consistency Level	Multi-DC Behavior	DC Failure Impact
`LOCAL_ONE`	Local DC only	Fails if local DC down
`LOCAL_QUORUM`	Local DC only	Fails if local DC down
`QUORUM`	Global quorum	May succeed with one DC down
`EACH_QUORUM`	Quorum in every DC	Fails if any DC down
`ALL`	Every replica	Fails if any node down

Recommendation: Use LOCAL_QUORUM for most operations. Configure load balancer to allow remote DC failover only when acceptable for the use case.

Why Policies Matter¶

Default policies are designed for general use cases but may not match specific application requirements:

Load Balancing Examples¶

Scenario	Default Behavior	Problem
Multi-DC deployment	May route to remote DC	High latency if local DC not configured
Heterogeneous hardware	Equal distribution	Overloads weaker nodes
Batch analytics	Token-aware routing	Optimal for OLTP, but analytics may prefer round-robin

Retry Examples¶

Scenario	Default Behavior	Problem
Non-idempotent writes	May retry on timeout	Potential duplicate writes
Overloaded cluster	Retry immediately	Amplifies load, worsens situation
Read timeout	Retry same node	Node may still be slow

Reconnection Examples¶

Scenario	Default Behavior	Problem
Brief network blip	Exponential backoff	Slow recovery for transient issues
Node replacement	Standard reconnection	May attempt reconnection to decommissioned node
Rolling restart	Backoff after each node	Cascading delays

Policy Interactions¶

Policies do not operate in isolation—they interact during request execution:

If speculative execution is enabled, requests are sent concurrently:

Configuration Approach¶

Explicit Configuration¶

Do not rely on defaults for production deployments. Configure each policy explicitly:

// Java driver example - explicit policy configuration
CqlSession session = CqlSession.builder()
    .withLocalDatacenter("dc1")
    .withLoadBalancingPolicy(
        DefaultLoadBalancingPolicy.builder()
            .withLocalDatacenter("dc1")
            .withSlowReplicaAvoidance(true)
            .build())
    .withRetryPolicy(new CustomRetryPolicy())
    .withReconnectionPolicy(
        ExponentialReconnectionPolicy.builder()
            .withBaseDelay(Duration.ofSeconds(1))
            .withMaxDelay(Duration.ofMinutes(5))
            .build())
    .withSpeculativeExecutionPolicy(
        ConstantSpeculativeExecutionPolicy.builder()
            .withMaxExecutions(2)
            .withDelay(Duration.ofMillis(100))
            .build())
    .build();

Per-Statement Override¶

Some policies can be overridden per statement:

// Override retry policy for specific query
Statement statement = SimpleStatement.builder("SELECT * FROM users WHERE id = ?")
    .addPositionalValue(userId)
    .setRetryPolicy(FallthroughRetryPolicy.INSTANCE)  // No retries
    .build();

This allows different behavior for different query types (e.g., strict no-retry for non-idempotent writes).

Policy Recommendations by Use Case¶

Use Case	Load Balancing	Retry	Reconnection	Speculative Execution
OLTP (low latency)	Token-aware, local DC	Conservative (reads only)	Fast base (500ms)	Enable for reads
Batch/Analytics	Round-robin or token-aware	Aggressive retry	Standard	Disable
Multi-DC Active-Active	Token-aware, local DC, failover enabled	Per-DC retry	Standard	Local DC only
Write-heavy	Token-aware	No retry for writes	Standard	Disable
Read-heavy	Token-aware	Retry reads	Standard	Enable

OLTP Application Configuration¶

// Low-latency OLTP configuration
CqlSession session = CqlSession.builder()
    .withLocalDatacenter("dc1")
    .withLoadBalancingPolicy(
        DefaultLoadBalancingPolicy.builder()
            .withLocalDatacenter("dc1")
            .withSlowReplicaAvoidance(true)
            .build())
    // Conservative retry - only idempotent operations
    .withRetryPolicy(DefaultRetryPolicy.INSTANCE)
    // Fast reconnection for quick recovery
    .withReconnectionPolicy(
        ExponentialReconnectionPolicy.builder()
            .withBaseDelay(Duration.ofMillis(500))
            .withMaxDelay(Duration.ofMinutes(2))
            .build())
    // Speculative execution for tail latency
    .withSpeculativeExecutionPolicy(
        ConstantSpeculativeExecutionPolicy.builder()
            .withMaxExecutions(2)
            .withDelay(Duration.ofMillis(50))
            .build())
    .build();

Multi-DC Active-Active Configuration¶

// Multi-DC with controlled failover
CqlSession session = CqlSession.builder()
    .withLocalDatacenter("dc1")
    .withLoadBalancingPolicy(
        DefaultLoadBalancingPolicy.builder()
            .withLocalDatacenter("dc1")
            // Allow failover to 2 nodes in remote DC
            .withDcFailoverMaxNodesPerRemoteDc(2)
            .build())
    // Standard retry
    .withRetryPolicy(DefaultRetryPolicy.INSTANCE)
    // Standard reconnection
    .withReconnectionPolicy(
        ExponentialReconnectionPolicy.builder()
            .withBaseDelay(Duration.ofSeconds(1))
            .withMaxDelay(Duration.ofMinutes(5))
            .build())
    // No speculative execution across DCs (latency difference too high)
    .build();

Section Contents¶

Load Balancing Policy — Node selection and request distribution
Retry Policy — Handling failed requests and error classification
Reconnection Policy — Recovery after node failures
Speculative Execution Policy — Reducing tail latency through redundant requests