Query Pagination¶

Cassandra implements server-side pagination to handle result sets that are too large to return in a single response. This mechanism protects both clients and servers from memory exhaustion while enabling efficient iteration through large datasets.

Pagination Model¶

Why Pagination?¶

Without pagination, a query returning millions of rows would:

Exhaust server heap building the response
Consume excessive network bandwidth
Overwhelm client memory
Create unpredictable latency

Pagination addresses these issues by:

Limiting rows per response
Streaming results incrementally
Maintaining predictable memory usage
Enabling progress tracking

Page Mechanics¶

Protocol Support¶

Request Parameters¶

The QUERY and EXECUTE messages include pagination parameters:

Parameter	Type	Description
page_size	int	Maximum rows per page
paging_state	bytes	Resume token (null for first page)

Response Metadata¶

The RESULT message includes pagination state:

Result Metadata Flags:
  0x0002 - HAS_MORE_PAGES: More pages available

Paging State:
  Present when HAS_MORE_PAGES is set
  Opaque token for resumption

Page Size Behavior¶

page_size Value	Behavior
Positive	Return at most N rows
Zero	Use server default
Negative	Protocol error
Omitted	Use server default (5000 in Cassandra)

Paging State¶

Structure¶

The paging state is an opaque token containing:

Paging State Contents:
  - Partition key of next row
  - Clustering key of next row
  - Remaining row count in partition
  - Query-specific metadata

Clients must treat this as opaque—the format may change between versions.

State Lifecycle¶

State Validity¶

Paging state is valid only:

For the same query (same CQL, parameters)
Against the same keyspace/table
Within reasonable time (data may change)
With compatible protocol version

Invalid state results in: - Undefined behavior - Potential duplicate/missing rows - Protocol errors in some cases

Pagination Patterns¶

Automatic Pagination (Driver-Managed)¶

Most drivers provide automatic pagination:

# Conceptual - Python driver style
result = session.execute("SELECT * FROM large_table")

# Driver fetches pages automatically as you iterate
for row in result:
    process(row)
    # Driver transparently fetches next page when needed

How It Works: 1. Execute query with page_size 2. Receive first page 3. As application iterates, track position 4. When approaching page end, fetch next page 5. Continue until no more pages

Manual Pagination¶

For explicit control:

# Conceptual - manual pagination
page_state = None
while True:
    result = session.execute(
        query,
        paging_state=page_state,
        fetch_size=1000
    )

    for row in result.current_page():
        process(row)

    page_state = result.paging_state
    if page_state is None:
        break  # No more pages

Stateless Pagination (Token-Based)¶

For APIs needing stable pagination across requests:

-- Use token() for stable, resumable pagination
SELECT * FROM users
WHERE token(user_id) > token(last_seen_id)
LIMIT 100;

Advantages:

Stable across schema changes
No paging state to manage
Supports arbitrary resume points

Disadvantages:

Requires partition key in results
Manual position tracking
Not suitable for all query types

API Pagination

For REST APIs or other stateless interfaces, token-based pagination is often preferable since it does not require storing paging state between requests.

Server-Side Behavior¶

Page Building¶

The coordinator builds pages by:

Sending read requests to replicas
Collecting rows up to page_size
Tracking position for resumption
Including continuation state in response

Memory Management¶

Per-Query Memory:
  - Read buffer: page_size × avg_row_size
  - Merge buffer: RF × page_size (worst case)
  - Response buffer: page_size × row_size

Example (page_size=5000, RF=3, 1KB rows):
  - Read: 5MB
  - Merge: 15MB worst case
  - Response: 5MB
  Total: ~10-25MB per concurrent query

Coordinator vs Replica Pages¶

Pagination involves two levels:

Replica-Level: - Replica reads data from SSTables/memtables - Returns chunks to coordinator - Size controlled by internal settings

Coordinator-Level: - Aggregates from multiple replicas - Applies consistency level - Returns page to client - Size controlled by page_size

Consistency Considerations¶

Read Consistency¶

Page requests are independent queries:

Page 1: Read at QUORUM, sees data version V1
Page 2: Read at QUORUM, sees data version V2 (concurrent write)

Implications:

Pages may reflect different snapshots
Concurrent writes can cause duplicates or gaps
Strong consistency requires application-level handling

No Snapshot Isolation

Pagination does not provide snapshot isolation. Each page fetch is an independent query that may see different data versions if concurrent writes occur.

Duplicate/Missing Rows¶

Concurrent modifications can cause:

Scenario	Result
Row inserted in scanned range	May appear in later page
Row deleted in unscanned range	Correctly excluded
Row moved (PK change)	May appear twice or not at all
Row updated	Different values in different pages

Mitigation Strategies¶

Timestamps - Filter by write time
Versioning - Include version column
Idempotent processing - Handle duplicates gracefully
Snapshot isolation - Accept eventual consistency

Performance Characteristics¶

Latency¶

Factor	Impact
Page size	Larger = higher latency per page
Row size	Larger = more network time
Consistency level	Higher = more coordination
Cluster load	Affects all pages equally

Throughput¶

Rows per second = page_size / page_latency

Example:
  page_size = 5000
  page_latency = 50ms
  throughput = 100,000 rows/second

Page Size Selection¶

Use Case	Recommended Size	Rationale
Interactive UI	20-100	Low latency per page
Batch processing	1000-5000	Balance latency/overhead
Export/migration	5000-10000	Maximize throughput
Low memory	100-500	Minimize buffer usage

Advanced Patterns¶

Parallel Pagination¶

Process multiple token ranges concurrently:

# Conceptual - parallel token range scanning
ranges = get_token_ranges()  # Split full ring

async def scan_range(start, end):
    query = f"SELECT * FROM table WHERE token(pk) >= {start} AND token(pk) < {end}"
    async for row in session.execute_async(query):
        yield row

# Process ranges in parallel
results = await asyncio.gather(*[scan_range(r.start, r.end) for r in ranges])

Resumable Processing¶

Save state for crash recovery:

# Save checkpoint
def save_checkpoint(paging_state, processed_count):
    store.put("checkpoint", {
        "paging_state": base64.encode(paging_state),
        "count": processed_count
    })

# Resume from checkpoint
def load_checkpoint():
    cp = store.get("checkpoint")
    return base64.decode(cp["paging_state"]), cp["count"]

Pagination with Aggregation¶

Aggregations must scan all data:

-- This scans all matching rows (no pagination benefit)
SELECT COUNT(*) FROM events WHERE date = '2024-01-15';

-- Better: paginate and count client-side for huge tables
SELECT * FROM events WHERE date = '2024-01-15';
-- Then count in application

Limitations¶

No Random Access¶

Cannot jump to arbitrary page:

Page N requires: state from page N-1
Cannot request: "Give me page 5 directly"

Workaround: Use token ranges for partitioned random access.

No Page Count¶

Total pages unknown until complete:

Cannot determine: "How many pages total?"
Can only know: "Are there more pages?"

Workaround: Estimate from row count statistics.

State Expiration¶

Long-running pagination may fail:

Compaction removes data
TTL expires rows
Schema changes invalidate state

Workaround: Use token-based pagination for long operations.

Cross-Partition Limits¶

Pagination across partitions has limits:

-- May not return exactly page_size rows
SELECT * FROM table LIMIT 1000;  -- Scans until 1000 rows found

The server may return fewer rows than page_size if: - Coordinator aggregation complexity - Cross-partition tombstone handling - Internal batch size limits

Monitoring Pagination¶

Metrics to Track¶

Metric	Indicates
Pages per query	Query result size
Page fetch latency	Server performance
Timeout rate	Overloaded cluster
Empty pages	Tombstone issues

Tracing Pagination¶

Enable query tracing to see pagination:

TRACING ON;
SELECT * FROM large_table;

Trace shows: - Rows read per page - Replicas contacted - Time per operation

CQL Protocol - Protocol pagination parameters
Async Connections - Async iteration
Failure Handling - Handling page failures