Asynchronous Connections¶

Cassandra drivers use asynchronous, non-blocking I/O to achieve high throughput with minimal resources. This architecture enables thousands of concurrent requests using a small number of connections.

Traditional RDBMS Connection Model¶

The Synchronous RPC Pattern¶

Traditional relational databases (PostgreSQL, MySQL, Oracle, SQL Server) use a synchronous, thread-per-connection model inherited from RPC (Remote Procedure Call) patterns:

In this model:

One thread per request: Each database query requires a dedicated application thread
Thread blocks until response: The thread cannot do other work while waiting
One connection per active thread: The connection is exclusively held during query execution
Server-side resources: Database creates a worker process/thread for each connection

Problems with the Traditional Model¶

Resource Exhaustion:

Resource	Consumption	Impact
Application threads	1 per concurrent query	500 queries = 500 threads = 500MB+ memory
TCP connections	1 per thread	File descriptor limits, server connection limits
Database workers	1 per connection	`max_connections` parameter limits concurrency
Context switches	2 per request (app + DB)	CPU overhead scales with concurrency

Connection Pool Challenges:

Pool size too small:
  - Threads block waiting for connections
  - Request queuing and timeouts
  - Underutilized server capacity

Pool size too large:
  - Memory wasted on idle connections
  - Database overwhelmed during bursts
  - Connection limits exhausted

Stale Connection Problem:

Traditional connection pools suffer from stale connections:

Stale connections occur when:

Network interruption: Firewall drops idle connection, load balancer timeout
Server restart: Database recycled but client unaware
Idle timeout: Connection exceeds server's wait_timeout
TCP keepalive failure: Underlying socket becomes invalid

Connection Recycling Overhead:

Operation	Typical Latency	Notes
TCP handshake	0.5-1ms (same DC)	3-way handshake
TLS negotiation	2-5ms	Certificate exchange, key derivation
Authentication	1-3ms	Credential validation
Session setup	1-2ms	Set timezone, charset, schema
Total	5-15ms	Per new connection

Scaling Limitations:

Traditional Scaling:
  1000 requests/sec @ 10ms latency = 10 connections needed
  10000 requests/sec @ 10ms latency = 100 connections needed
  100000 requests/sec @ 10ms latency = 1000 connections needed ← hits limits

Database limits:
  PostgreSQL: max_connections default 100
  MySQL: max_connections default 151
  Oracle: processes parameter limits

Application limits:
  Thread pool sizing
  Memory for thread stacks
  Context switch overhead

Connection Explosion Under Load

During traffic spikes, applications often exhaust connection pools. New connection creation adds latency, and if the database hits max_connections, requests fail entirely. This creates a cascading failure pattern where increased load leads to decreased capacity.

Cassandra's Asynchronous Model¶

Stream Multiplexing Architecture¶

Cassandra drivers use a fundamentally different approach: multiplexed asynchronous connections with stream IDs:

Key Differences:

Aspect	Traditional RDBMS	Cassandra
Connection utilization	1 query at a time	32,768 concurrent queries
Threads required	1 per query	1-2 for all I/O
Connection count	100s needed	2-8 per node sufficient
Response ordering	Sequential	Multiplexed (any order)
Memory per connection	~1MB (thread)	~64KB (buffers)
New connection cost	Paid frequently	Paid rarely (persistent)

Why This Works for Cassandra¶

No server-side session state: Unlike RDBMS, Cassandra connections have no transaction context, cursors, or prepared statement binding that requires connection affinity
Protocol-level multiplexing: The CQL binary protocol includes stream IDs (0-32767), allowing response matching without connection-per-request
Stateless request handling: Each request contains all necessary context (consistency level, timestamp, etc.), enabling any server thread to handle any request
Designed for distribution: Cassandra expects many nodes, so minimizing per-node connection overhead is essential

Asynchronous I/O Benefits¶

Traditional synchronous model:

Thread 1: send request → [blocked waiting 5ms] → receive response → process
Thread 2: send request → [blocked waiting 5ms] → receive response → process
Thread 3: send request → [blocked waiting 5ms] → receive response → process

Cassandra asynchronous model:

Event Loop: send Q1 → send Q2 → send Q3 → send Q4 → send Q5 →
            receive R3 → receive R1 → receive R5 → receive R2 → receive R4

Aspect	Synchronous	Asynchronous
Threads per request	1	0 (shared)
Memory per connection	~1MB (thread stack)	~64KB (buffers)
Context switches	Per request	Per I/O event
Concurrent requests	Limited by threads	Limited by protocol (32K)
Latency overhead	Thread scheduling	Minimal

Connection Efficiency

A single Cassandra connection can handle the same throughput that would require 100+ PostgreSQL connections. For a 9-node cluster, 8 connections per node (72 total) can sustain hundreds of thousands of requests per second.

Connection Architecture¶

Connection Components¶

Event Loop¶

The event loop is the core of asynchronous I/O:

Selector/Epoll - Monitors sockets for read/write readiness
Event Dispatch - Routes I/O events to handlers
Task Execution - Runs callbacks and completions

Most drivers use platform-native I/O: - Linux: epoll - macOS: kqueue - Windows: IOCP

Non-Blocking Sockets¶

Connections use non-blocking TCP sockets:

Socket Configuration:
  - TCP_NODELAY: Disable Nagle's algorithm (reduce latency)
  - SO_KEEPALIVE: Detect dead connections
  - Non-blocking mode: Never block on read/write

Stream Multiplexing¶

Stream Allocation¶

The CQL protocol supports up to 32,768 concurrent streams per connection (protocol v3+). Stream allocation:

Stream Management Strategies¶

Sequential Allocation:

Streams: 1, 2, 3, 4, ... (wrap at max)
Pros: Simple, predictable
Cons: Head-of-line if stream stuck

Pooled Allocation:

Free list: [5, 12, 7, 23, ...]
Allocate: pop from list
Release: push to list
Pros: Fast allocation, no fragmentation

In-Flight Request Limits¶

Drivers limit concurrent requests per connection:

Configuration	Typical Value	Purpose
Max requests per connection	1024-2048	Prevent overload
High watermark	80% of max	Trigger backpressure
Low watermark	50% of max	Resume after backpressure

Backpressure Behavior

When the high watermark is reached, new requests are queued or rejected until in-flight requests drop below the low watermark. This prevents overwhelming individual connections.

Connection Pooling¶

Pool Architecture¶

Each node maintains a connection pool:

Pool Configuration¶

Parameter	Description	Default
Core connections	Minimum maintained connections	1
Max connections	Maximum connections to create	8 (local), 2 (remote)
Max requests/connection	Concurrent requests limit	1024

Remote Datacenter Connections

Remote datacenter pools typically use fewer connections (1-2) since cross-DC requests should be rare in normal operation. This conserves resources while maintaining fallback capability.

Connection Selection¶

When routing a request to a pool:

Least-loaded selection - Choose connection with fewest in-flight requests
Round-robin - Cycle through connections (simpler)
Random - Random selection (load spreads naturally)

Pool Scaling¶

Pools grow and shrink based on demand:

Scale Up:

if (all_connections_at_max_requests && pool_size < max_connections):
    create_new_connection()

Scale Down:

if (connection_idle_time > threshold && pool_size > core_connections):
    close_idle_connection()

Request Lifecycle¶

Submission¶

# Conceptual flow (not actual API)
async def execute(query):
    # 1. Select node via load balancer
    node = load_balancer.select(query)

    # 2. Get connection from pool
    connection = await pool.acquire(node)

    # 3. Allocate stream
    stream_id = connection.allocate_stream()

    # 4. Create response future
    future = Future()
    connection.register(stream_id, future)

    # 5. Encode and send
    frame = encode_query(stream_id, query)
    connection.write(frame)

    # 6. Return future (completes when response arrives)
    return future

Response Handling¶

# Conceptual flow
def on_data_received(connection, data):
    # 1. Decode frame
    frame = decode_frame(data)

    # 2. Find waiting future
    future = connection.get_pending(frame.stream_id)

    # 3. Release stream
    connection.release_stream(frame.stream_id)

    # 4. Complete future
    if frame.is_error():
        future.set_exception(decode_error(frame))
    else:
        future.set_result(decode_result(frame))

Pipelining¶

Multiple requests can be sent before receiving responses:

Time →
Client: [Q1][Q2][Q3][Q4]----------------→
Server: ------[R2][R1][R4][R3]----------→

Without pipelining (would require):
Client: [Q1]----[Q2]----[Q3]----[Q4]---→
Server: ----[R1]----[R2]----[R3]----[R4]→

Pipelining benefits: - Better network utilization - Reduced per-request latency - Amortized TCP overhead

Backpressure¶

Request-Side Backpressure¶

When the system is overloaded, drivers apply backpressure:

Response-Side Backpressure¶

If the application cannot process responses fast enough:

Buffer responses - Accumulate in memory (risky)
Pause reading - Stop reading from socket (TCP backpressure)
Drop connection - Last resort

Backpressure Signals¶

Signal	Meaning	Action
Queue full	Too many pending requests	Slow down submissions
High in-flight	Approaching connection limit	Consider more connections
Read buffer full	Processing too slow	Increase consumer capacity

Ignoring Backpressure

Applications that ignore backpressure signals risk memory exhaustion, request timeouts, and cascading failures. Design applications to handle BusyConnectionException or equivalent errors gracefully.

Connection Health¶

Heartbeats¶

Drivers send periodic heartbeats to detect dead connections:

Heartbeat interval: 30 seconds (typical)
Mechanism: OPTIONS request or protocol-level ping
Timeout: If no response, mark connection unhealthy

Health Checks¶

Check	Frequency	Failure Action
Heartbeat	30s	Mark unhealthy, reconnect
Read timeout	Per request	Retry on different connection
Write failure	Immediate	Close connection
Protocol error	Immediate	Close connection

Reconnection¶

When connections fail:

Mark connection dead - Remove from active pool
Schedule reconnection - Exponential backoff
Notify load balancer - May affect node status

Reconnection schedule:
  Attempt 1: immediate
  Attempt 2: 1 second
  Attempt 3: 2 seconds
  Attempt 4: 4 seconds
  ...
  Max delay: 60 seconds

Memory Management¶

Buffer Allocation¶

Efficient buffer management is critical for performance:

Read Buffers:

Per-connection read buffer: 64KB typical
Allocation: Pre-allocated or pooled
Growth: Expand for large frames

Write Buffers:

Per-connection write buffer: 64KB typical
Coalescing: Batch small writes
Flushing: On buffer full or explicit flush

Object Pooling¶

Drivers pool frequently allocated objects:

Object	Pooling Benefit
Frames	Avoid allocation per request
Byte buffers	Reduce GC pressure
Futures/Promises	Reuse completion objects
Row objects	Minimize allocation during iteration

Memory Budgeting¶

Total driver memory usage:

Memory = (connections × connection_overhead) +
         (in_flight_requests × request_overhead) +
         (result_sets × result_overhead)

Example:
  50 connections × 128KB = 6.4MB
  5000 in-flight × 2KB = 10MB
  Result buffers = variable
  Total: ~20-50MB typical

Threading Models¶

Single-Threaded Event Loop¶

Some drivers use a single I/O thread:

Pros: Simple, no synchronization needed
Cons: CPU-bound work blocks I/O
Pattern: Node.js driver

Thread Pool¶

Others use thread pools:

I/O threads: Handle network operations
Worker threads: Execute callbacks
Pros: Better CPU utilization
Cons: Synchronization complexity
Pattern: Java driver

Hybrid¶

Modern drivers often combine approaches:

I/O threads: 1 per N connections
Callback threads: Configurable pool
User code: Application threads

Performance Characteristics¶

Latency Components¶

Component	Typical Range	Notes
Stream allocation	<1 μs	Lock-free in good implementations
Frame encoding	1-10 μs	Depends on query complexity
Buffer copy	1-5 μs	Zero-copy when possible
Syscall overhead	1-10 μs	Batching amortizes
Network latency	100 μs - 10 ms	Same DC vs cross-DC

Throughput Factors¶

Maximum throughput depends on:

Max throughput = connections × streams_per_connection × (1 / avg_latency)

Example:
  8 connections × 1024 streams × (1 / 0.005s) = 1.6M requests/sec theoretical

Practical limits:
  - Server capacity
  - Network bandwidth
  - Serialization CPU
  - GC pauses

Optimization Techniques¶

Technique	Benefit
Connection warming	Avoid cold start latency
Request coalescing	Reduce syscalls
Zero-copy buffers	Minimize CPU
Prepared statements	Reduce encoding

CQL Protocol - Wire protocol details
Load Balancing - Node selection strategies
Failure Handling - Error recovery