Driver Best Practices¶
This page consolidates production configuration recommendations for Cassandra drivers.
Session Management¶
Single Session per Application¶
Create one session and reuse it throughout the application lifecycle:
// CORRECT: Single session, created once
public class CassandraConfig {
private static CqlSession session;
public static synchronized CqlSession getSession() {
if (session == null) {
session = CqlSession.builder()
.withLocalDatacenter("dc1")
.build();
}
return session;
}
public static void shutdown() {
if (session != null) {
session.close();
}
}
}
// WRONG: Session per request
public User getUser(UUID id) {
try (CqlSession session = CqlSession.builder().build()) { // Expensive!
return session.execute(...);
}
}
| Aspect | Single Session | Session per Request |
|---|---|---|
| Connection overhead | Once at startup | Every request |
| Metadata discovery | Once | Every request |
| Prepared statement cache | Shared | Rebuilt each time |
| Resource usage | Predictable | Unbounded |
Graceful Shutdown¶
Close the session cleanly on application shutdown:
Runtime.getRuntime().addShutdownHook(new Thread(() -> {
session.close(); // Waits for in-flight requests
}));
Connection Configuration¶
Contact Points¶
Provide multiple contact points for initial connection:
CqlSession session = CqlSession.builder()
.addContactPoint(new InetSocketAddress("10.0.1.1", 9042))
.addContactPoint(new InetSocketAddress("10.0.1.2", 9042))
.addContactPoint(new InetSocketAddress("10.0.1.3", 9042))
.withLocalDatacenter("dc1")
.build();
The driver only needs one successful connection to discover the full cluster topology, but multiple contact points provide redundancy during startup.
Local Datacenter¶
Always configure local datacenter explicitly in multi-DC deployments:
// REQUIRED for multi-DC
.withLocalDatacenter("dc1")
Failure to configure results in potential cross-DC routing with high latency.
Connection Pool Sizing¶
Default pool settings work for most workloads. Adjust only when:
- Measured stream exhaustion occurs
- Throughput exceeds tens of thousands requests/second per node
- Monitoring shows pool-related bottlenecks
// Only if needed based on measurements
.withPoolingOptions(
PoolingOptions.builder()
.setConnectionsPerHost(DriverConnectionGroup.REMOTE, 1, 1)
.setConnectionsPerHost(DriverConnectionGroup.LOCAL, 2, 4)
.build())
Query Execution¶
Use Prepared Statements¶
Prepare all production queries:
// Prepare once at startup
private final PreparedStatement selectUser = session.prepare(
"SELECT * FROM users WHERE user_id = ?");
// Execute with bound values
public User getUser(UUID userId) {
Row row = session.execute(selectUser.bind(userId)).one();
return mapToUser(row);
}
Benefits:
- Reduced parsing overhead
- Token-aware routing
- Protection against CQL injection
Set Appropriate Consistency Levels¶
Choose consistency level based on requirements:
Statement statement = selectUser.bind(userId)
.setConsistencyLevel(ConsistencyLevel.LOCAL_QUORUM); // Explicit
| Use Case | Recommended CL |
|---|---|
| Strong consistency reads | LOCAL_QUORUM |
| Strong consistency writes | LOCAL_QUORUM |
| Eventually consistent reads | LOCAL_ONE |
| Analytics/reporting | ONE |
| Cross-DC consistency | QUORUM or EACH_QUORUM |
Set Query Timeouts¶
Configure appropriate timeouts:
Statement statement = selectUser.bind(userId)
.setTimeout(Duration.ofSeconds(5)); // Query-specific timeout
| Timeout Type | Recommendation |
|---|---|
| Read timeout | 5-10 seconds (longer than expected P99) |
| Write timeout | 10-30 seconds (allow for hints, batches) |
| Connection timeout | 5 seconds |
Error Handling¶
Handle Specific Exceptions¶
try {
session.execute(statement);
} catch (NoNodeAvailableException e) {
// All nodes down - circuit breaker or fail
log.error("Cluster unavailable", e);
throw new ServiceUnavailableException();
} catch (QueryExecutionException e) {
if (e instanceof ReadTimeoutException) {
// Replica(s) didn't respond - may retry
ReadTimeoutException rte = (ReadTimeoutException) e;
log.warn("Read timeout: received {}/{} required",
rte.getReceived(), rte.getRequired());
} else if (e instanceof WriteTimeoutException) {
// Write may or may not have succeeded
WriteTimeoutException wte = (WriteTimeoutException) e;
log.error("Write timeout for {}: received {}/{}",
wte.getWriteType(), wte.getReceived(), wte.getRequired());
// DO NOT retry non-idempotent writes automatically
} else if (e instanceof UnavailableException) {
// Not enough replicas alive
UnavailableException ue = (UnavailableException) e;
log.warn("Unavailable: alive {}/{} required",
ue.getAlive(), ue.getRequired());
}
}
Idempotency Marking¶
Mark idempotent operations explicitly:
// Safe to retry
Statement readStatement = selectUser.bind(userId)
.setIdempotent(true);
// NOT safe to retry
Statement counterStatement = updateCounter.bind(pageId)
.setIdempotent(false);
Policy Configuration¶
Production Policy Template¶
CqlSession session = CqlSession.builder()
.addContactPoints(contactPoints)
.withLocalDatacenter("dc1")
// Load balancing: token-aware with DC awareness
.withLoadBalancingPolicy(
DefaultLoadBalancingPolicy.builder()
.withLocalDatacenter("dc1")
.build())
// Retry: conservative, respects idempotency
.withRetryPolicy(DefaultRetryPolicy.INSTANCE)
// Reconnection: exponential backoff
.withReconnectionPolicy(
ExponentialReconnectionPolicy.builder()
.withBaseDelay(Duration.ofSeconds(1))
.withMaxDelay(Duration.ofMinutes(5))
.build())
// Speculative execution: disabled by default
// Enable only for idempotent, latency-sensitive queries
// .withSpeculativeExecutionPolicy(...)
.build();
Per-Query Policy Override¶
Override policies for specific query types:
// Latency-sensitive read with speculative execution
Statement fastRead = selectUser.bind(userId)
.setIdempotent(true)
.setSpeculativeExecutionPolicy(speculativePolicy);
// Non-idempotent write with no retry
Statement counterUpdate = incrementCounter.bind(pageId)
.setIdempotent(false)
.setRetryPolicy(FallthroughRetryPolicy.INSTANCE);
Monitoring¶
Essential Metrics¶
Monitor these driver metrics:
| Metric Category | Key Metrics |
|---|---|
| Latency | Request latency percentiles (P50, P95, P99) |
| Throughput | Requests per second |
| Errors | Error rate by type (timeout, unavailable, etc.) |
| Connections | Open connections per node |
| Pool | In-flight requests, available streams |
| Retries | Retry rate, retry success rate |
| Speculative | Trigger rate, win rate |
Health Checks¶
Implement application health checks:
public boolean isHealthy() {
try {
// Simple query to verify connectivity
session.execute("SELECT now() FROM system.local");
return true;
} catch (Exception e) {
return false;
}
}
Logging¶
Configure appropriate driver logging:
<!-- Log connection events -->
<logger name="com.datastax.oss.driver.internal.core.pool" level="INFO"/>
<!-- Log retries and speculative execution -->
<logger name="com.datastax.oss.driver.internal.core.retry" level="DEBUG"/>
<!-- Reduce noise from metadata refresh -->
<logger name="com.datastax.oss.driver.internal.core.metadata" level="WARN"/>
Common Anti-Patterns¶
| Anti-Pattern | Problem | Solution |
|---|---|---|
| Session per request | Massive overhead | Single shared session |
| Unprepared statements in loops | Parsing overhead, no token-aware | Prepare and reuse |
| Ignoring local datacenter | Cross-DC latency | Configure explicitly |
| Retrying non-idempotent writes | Data corruption | Mark idempotency, custom retry |
| Unbounded IN clauses | Prepared statement cache churn | Fixed sizes or pagination |
| Synchronous calls in async context | Thread pool exhaustion | Use async API consistently |
| No timeout configuration | Requests hang indefinitely | Set explicit timeouts |
| Catching generic Exception | Hides specific error handling | Catch specific exceptions |
Checklist¶
Before deploying to production:
- [ ] Single session instance shared across application
- [ ] Local datacenter configured explicitly
- [ ] All queries use prepared statements
- [ ] Consistency levels set explicitly
- [ ] Timeouts configured appropriately
- [ ] Idempotent operations marked
- [ ] Error handling for specific exception types
- [ ] Driver metrics exported to monitoring
- [ ] Health check endpoint implemented
- [ ] Graceful shutdown configured
- [ ] Connection pool sized appropriately (if non-default)
- [ ] Retry policy reviewed for workload
- [ ] Speculative execution evaluated (if latency-sensitive)
Related Documentation¶
- Connection Management — Connection pooling details
- Policies — Policy configuration reference
- Prepared Statements — Statement preparation and caching