Cassandra Driver Guide¶

This guide covers connecting applications to Apache Cassandra using official drivers, with setup and basic usage for popular programming languages.

Available Drivers¶

Language	Driver	Status	Repository
Java	Apache Cassandra Java Driver	Production	GitHub
Python	Apache Cassandra Python Driver	Production	GitHub
Node.js	DataStax Node.js Driver	Production	GitHub
Go	GoCQL	Production	GitHub
C#/.NET	Apache Cassandra C# Driver	Production	GitHub
C/C++	Apache Cassandra C++ Driver	Production	GitHub
Ruby	Apache Cassandra Ruby Driver	Production	GitHub
PHP	Apache Cassandra PHP Driver	Production	GitHub

Quick Start Examples¶

Java¶

<!-- pom.xml -->
<dependency>
    <groupId>com.datastax.oss</groupId>
    <artifactId>java-driver-core</artifactId>
    <version>4.17.0</version>
</dependency>

import com.datastax.oss.driver.api.core.CqlSession;
import com.datastax.oss.driver.api.core.cql.*;

public class QuickStart {
    public static void main(String[] args) {
        try (CqlSession session = CqlSession.builder()
                .addContactPoint(new InetSocketAddress("127.0.0.1", 9042))
                .withLocalDatacenter("datacenter1")
                .build()) {

            ResultSet rs = session.execute("SELECT release_version FROM system.local");
            Row row = rs.one();
            System.out.println("Cassandra version: " + row.getString("release_version"));
        }
    }
}

Python¶

pip install cassandra-driver

from cassandra.cluster import Cluster

cluster = Cluster(['127.0.0.1'])
session = cluster.connect()

row = session.execute("SELECT release_version FROM system.local").one()
print(f"Cassandra version: {row.release_version}")

cluster.shutdown()

Node.js¶

npm install cassandra-driver

const cassandra = require('cassandra-driver');

const client = new cassandra.Client({
  contactPoints: ['127.0.0.1'],
  localDataCenter: 'datacenter1'
});

async function run() {
  await client.connect();
  const result = await client.execute('SELECT release_version FROM system.local');
  console.log('Cassandra version:', result.rows[0].release_version);
  await client.shutdown();
}

run();

Go¶

go get github.com/gocql/gocql

package main

import (
    "fmt"
    "log"
    "github.com/gocql/gocql"
)

func main() {
    cluster := gocql.NewCluster("127.0.0.1")
    cluster.Keyspace = "system"
    session, err := cluster.CreateSession()
    if err != nil {
        log.Fatal(err)
    }
    defer session.Close()

    var version string
    if err := session.Query("SELECT release_version FROM local").Scan(&version); err != nil {
        log.Fatal(err)
    }
    fmt.Println("Cassandra version:", version)
}

Connection Configuration¶

Essential Settings¶

All drivers require these settings:

Setting	Description	Example
Contact Points	Initial nodes to connect to	`["10.0.0.1", "10.0.0.2"]`
Local Datacenter	Preferred DC for routing	`"dc1"`
Port	CQL native port	`9042`
Keyspace	Default keyspace (optional)	`"my_app"`

Authentication¶

# Python example with authentication
from cassandra.auth import PlainTextAuthProvider

auth_provider = PlainTextAuthProvider(
    username='app_user',
    password='app_password'
)

cluster = Cluster(
    ['127.0.0.1'],
    auth_provider=auth_provider
)

SSL/TLS¶

# Python example with SSL
from cassandra.cluster import Cluster
import ssl

ssl_context = ssl.SSLContext(ssl.PROTOCOL_TLS_CLIENT)
ssl_context.load_verify_locations('/path/to/ca.crt')
ssl_context.load_cert_chain(
    certfile='/path/to/client.crt',
    keyfile='/path/to/client.key'
)

cluster = Cluster(
    ['127.0.0.1'],
    ssl_context=ssl_context
)

Load Balancing Policies¶

Token-Aware (Recommended)¶

Routes queries to the node that owns the data:

// Java - Token-aware is default in v4+
CqlSession session = CqlSession.builder()
    .addContactPoint(new InetSocketAddress("127.0.0.1", 9042))
    .withLocalDatacenter("dc1")
    .build();

# Python
from cassandra.policies import TokenAwarePolicy, DCAwareRoundRobinPolicy

cluster = Cluster(
    ['127.0.0.1'],
    load_balancing_policy=TokenAwarePolicy(
        DCAwareRoundRobinPolicy(local_dc='dc1')
    )
)

DC-Aware Round Robin¶

Prefers nodes in the local datacenter:

// Node.js
const client = new cassandra.Client({
  contactPoints: ['127.0.0.1'],
  localDataCenter: 'dc1',
  policies: {
    loadBalancing: new cassandra.policies.loadBalancing.DCAwareRoundRobinPolicy('dc1')
  }
});

Prepared Statements¶

Always use prepared statements for: - Better performance (parsed once) - Protection against CQL injection - Type safety

Example¶

# Python
from cassandra.cluster import Cluster

cluster = Cluster(['127.0.0.1'])
session = cluster.connect('my_keyspace')

# Prepare once
insert_stmt = session.prepare("""
    INSERT INTO users (user_id, username, email)
    VALUES (?, ?, ?)
""")

# Execute many times
import uuid
session.execute(insert_stmt, [uuid.uuid4(), 'john_doe', '[email protected]'])
session.execute(insert_stmt, [uuid.uuid4(), 'jane_doe', '[email protected]'])

// Java
PreparedStatement prepared = session.prepare(
    "INSERT INTO users (user_id, username, email) VALUES (?, ?, ?)"
);

BoundStatement bound = prepared.bind(
    UUID.randomUUID(), "john_doe", "[email protected]"
);
session.execute(bound);

Consistency Levels¶

Set consistency per query:

# Python
from cassandra import ConsistencyLevel
from cassandra.query import SimpleStatement

stmt = SimpleStatement(
    "SELECT * FROM users WHERE user_id = %s",
    consistency_level=ConsistencyLevel.QUORUM
)
session.execute(stmt, [user_id])

// Java
session.execute(
    SimpleStatement.newInstance("SELECT * FROM users WHERE user_id = ?", userId)
        .setConsistencyLevel(DefaultConsistencyLevel.QUORUM)
);

Consistency Level Reference¶

Level	Reads	Writes	Use Case
`ONE`	Fast	Fast	Non-critical data
`QUORUM`	Majority	Majority	Default for most apps
`LOCAL_QUORUM`	Local majority	Local majority	Multi-DC deployments
`ALL`	All replicas	All replicas	Highest consistency

Async Operations¶

For high-throughput applications:

Python (asyncio)¶

from cassandra.cluster import Cluster
import asyncio

cluster = Cluster(['127.0.0.1'])
session = cluster.connect('my_keyspace')

async def insert_user(user_id, username, email):
    future = session.execute_async(
        "INSERT INTO users (user_id, username, email) VALUES (%s, %s, %s)",
        [user_id, username, email]
    )
    return future.result()

# Run multiple inserts concurrently
async def main():
    tasks = [
        insert_user(uuid.uuid4(), f'user{i}', f'user{i}@example.com')
        for i in range(100)
    ]
    await asyncio.gather(*tasks)

asyncio.run(main())

Java (CompletionStage)¶

CompletionStage<AsyncResultSet> future = session.executeAsync(
    "SELECT * FROM users WHERE user_id = ?", userId
);

future.thenAccept(resultSet -> {
    Row row = resultSet.one();
    System.out.println("Username: " + row.getString("username"));
});

Node.js (Promise-based)¶

// Execute multiple queries concurrently
const queries = [
  client.execute('SELECT * FROM users WHERE user_id = ?', [userId1]),
  client.execute('SELECT * FROM users WHERE user_id = ?', [userId2]),
  client.execute('SELECT * FROM users WHERE user_id = ?', [userId3])
];

const results = await Promise.all(queries);

Connection Pooling¶

Drivers maintain connection pools automatically. Key settings:

Python¶

from cassandra.cluster import Cluster
from cassandra.policies import HostDistance

cluster = Cluster(['127.0.0.1'])

# Set pool size per host
cluster.set_core_connections_per_host(HostDistance.LOCAL, 4)
cluster.set_max_connections_per_host(HostDistance.LOCAL, 10)

Java¶

// application.conf
datastax-java-driver {
  advanced.connection {
    pool {
      local.size = 4
      remote.size = 2
    }
  }
}

Retry Policies¶

Handle transient failures:

# Python
from cassandra.policies import RetryPolicy
from cassandra.cluster import Cluster

class CustomRetryPolicy(RetryPolicy):
    def on_read_timeout(self, query, consistency, required, received, data_retrieved, retry_num):
        if retry_num < 3:
            return self.RETRY, consistency
        return self.RETHROW, None

cluster = Cluster(
    ['127.0.0.1'],
    default_retry_policy=CustomRetryPolicy()
)

Error Handling¶

Common exceptions to handle:

Exception	Cause	Action
`NoHostAvailable`	No nodes reachable	Check connectivity
`ReadTimeout`	Read took too long	Retry or check data model
`WriteTimeout`	Write took too long	Retry or check cluster health
`Unavailable`	Not enough replicas	Check cluster health
`InvalidQuery`	CQL syntax error	Fix query

Example¶

from cassandra import ReadTimeout, Unavailable, NoHostAvailable

try:
    session.execute("SELECT * FROM users")
except ReadTimeout:
    print("Query timed out - consider adjusting timeout or data model")
except Unavailable as e:
    print(f"Not enough replicas: required={e.required_replicas}, alive={e.alive_replicas}")
except NoHostAvailable as e:
    print(f"Cannot connect to any host: {e.errors}")

Best Practices¶

Do¶

✅ Use prepared statements for repeated queries
✅ Set appropriate consistency levels
✅ Use token-aware load balancing
✅ Handle exceptions gracefully
✅ Close sessions and clusters on shutdown
✅ Use async for high-throughput workloads

Don't¶

❌ Create new sessions for each query
❌ Use ALLOW FILTERING in production
❌ Ignore connection pool settings
❌ Use ALL consistency unnecessarily
❌ Ignore timeouts and retries

Driver Documentation¶

For detailed driver documentation, refer to the official repositories:

Java Driver - Apache Cassandra Java Driver
Python Driver - Apache Cassandra Python Driver
Node.js Driver - DataStax Node.js Driver
Go Driver - GoCQL

Next Steps¶

After connecting the application:

Data Modeling - Design effective schemas
CQL Reference - Full query language reference
Performance Tuning - Optimize application performance