Prepared Statements¶
Prepared statements are the recommended method for executing CQL queries in production applications. They provide performance benefits, security protection, and enable token-aware routing.
How Prepared Statements Work¶
Prepared statements separate query parsing from execution:
Simple Statement (every execution)¶
Prepared Statement¶
PREPARE phase (once):
EXECUTE phase (every request):
Performance Benefits¶
Reduced Server-Side Overhead¶
| Operation | Simple Statement | Prepared Statement |
|---|---|---|
| Parse query | Every request | Once |
| Validate schema | Every request | Once |
| Create plan | Every request | Once |
| Execute | Every request | Every request |
For high-throughput workloads, the parsing overhead is significant:
Throughput comparison (10,000 queries/sec):
Simple statements:
10,000 × (parse + validate + plan + execute)
CPU overhead: ~30% spent on parsing
Prepared statements:
1 × (parse + validate + plan)
10,000 × (execute only)
CPU overhead: <5% for statement handling
Token-Aware Routing¶
Prepared statements enable token-aware routing because the driver knows the partition key structure:
Without prepared statements, the driver cannot determine partition key values from embedded query strings.
Prepared Statement Lifecycle¶
Preparation¶
// Prepare once (typically at application startup)
PreparedStatement prepared = session.prepare(
"SELECT * FROM users WHERE user_id = ?");
The driver:
- Sends PREPARE request to one node
- Receives prepared statement ID and metadata
- Caches the prepared statement locally
- Automatically re-prepares on other nodes as needed
Execution¶
// Execute many times with different values
BoundStatement bound = prepared.bind(userId);
ResultSet results = session.execute(bound);
The driver:
- Looks up cached prepared statement
- Serializes bound values
- Sends EXECUTE request (not the query string)
- Routes token-aware if partition key bound
Automatic Re-Preparation¶
If a node restarts or does not have the prepared statement, the driver automatically re-prepares:
This is transparent to the application.
Binding Values¶
Positional Binding¶
PreparedStatement prepared = session.prepare(
"INSERT INTO users (id, name, email) VALUES (?, ?, ?)");
// Bind by position
BoundStatement bound = prepared.bind(
userId, // position 0
"Alice", // position 1
"[email protected]" // position 2
);
Named Binding¶
PreparedStatement prepared = session.prepare(
"INSERT INTO users (id, name, email) VALUES (:id, :name, :email)");
// Bind by name
BoundStatement bound = prepared.bind()
.setUuid("id", userId)
.setString("name", "Alice")
.setString("email", "[email protected]");
Named binding is more readable and less error-prone for queries with many parameters.
Null Values¶
Explicitly bind null values:
// Correct: explicit null
bound.setString("middle_name", null);
// Incorrect: unbound value
// Leaves value unset, may cause errors
Caching Prepared Statements¶
Driver-Side Cache¶
Drivers maintain a cache of prepared statements:
Driver Prepared Statement Cache
| Query String | Prepared ID | Metadata |
|---|---|---|
SELECT * FROM users WHERE id=? |
0x8a3f... |
[id, name, email] |
INSERT INTO events (...) VALUES |
0x2b7c... |
[partition_id, event_id] |
UPDATE users SET name=? WHERE |
0x9d1e... |
[name, id] |
Cache Lookup
Cache lookup: O(1) by query string hash
Application-Level Caching¶
Prepare statements once and reuse:
// GOOD: Prepare once, reuse
public class UserRepository {
private final PreparedStatement selectUser;
private final PreparedStatement insertUser;
public UserRepository(CqlSession session) {
this.selectUser = session.prepare(
"SELECT * FROM users WHERE id = ?");
this.insertUser = session.prepare(
"INSERT INTO users (id, name) VALUES (?, ?)");
}
public User getUser(UUID id) {
return session.execute(selectUser.bind(id))...;
}
}
// BAD: Prepare every request
public User getUser(UUID id) {
// Prepares the same statement repeatedly!
PreparedStatement ps = session.prepare(
"SELECT * FROM users WHERE id = ?");
return session.execute(ps.bind(id))...;
}
The driver caches prepared statements, so re-preparing is not catastrophic, but it adds unnecessary overhead.
Schema Changes and Prepared Statements¶
When schema changes, prepared statements may become invalid:
Handling Schema Changes¶
| Driver Behavior | Description |
|---|---|
| Automatic re-prepare | Driver detects schema change, re-prepares |
| Metadata refresh | Driver updates column metadata |
| Application notification | Some drivers emit events for schema changes |
Best practice: Prepare statements at startup and handle re-preparation transparently. Avoid caching result metadata assumptions.
Batch Statements¶
Prepared statements can be used in batches:
PreparedStatement insertEvent = session.prepare(
"INSERT INTO events (partition_id, event_id, data) VALUES (?, ?, ?)");
BatchStatement batch = BatchStatement.newInstance(BatchType.UNLOGGED)
.add(insertEvent.bind(partitionId, event1Id, data1))
.add(insertEvent.bind(partitionId, event2Id, data2))
.add(insertEvent.bind(partitionId, event3Id, data3));
session.execute(batch);
Important: Batches should contain statements for the same partition. Cross-partition batches have significant performance overhead.
Prepared Statement Limits¶
Cassandra limits prepared statements per node:
| Parameter | Default | Description |
|---|---|---|
| prepared_statements_cache_size_mb | 10MB | Memory for prepared statement cache |
When cache is full, least-recently-used statements are evicted:
Avoiding Cache Churn¶
| Anti-Pattern | Problem |
|---|---|
| Dynamic query generation | Thousands of unique queries fill cache |
| String concatenation in queries | Each variation is separate statement |
| Unbounded IN clauses | IN (?, ?, ?, ...) with varying count |
// BAD: Dynamic IN clause (each size is different prepared statement)
String query = "SELECT * FROM users WHERE id IN (" +
String.join(",", Collections.nCopies(ids.size(), "?")) + ")";
// BETTER: Fixed batch size or multiple queries
// Or use token-range queries for large sets
Best Practices¶
| Practice | Rationale |
|---|---|
| Prepare at startup | Amortize preparation cost, fail fast on errors |
| Reuse prepared statements | Avoid redundant cache lookups |
| Use named parameters | More readable, less error-prone |
| Bind all values explicitly | Avoid unbound value errors |
| Use for all production queries | Performance and token-aware routing |
| Avoid dynamic query generation | Prevents cache churn |
Related Documentation¶
- Load Balancing Policy — Token-aware routing with prepared statements
- CQL Reference — Query syntax