sstabledump¶

Exports SSTable contents as JSON for inspection, debugging, and data analysis.

Synopsis¶

sstabledump [options] <sstable_file>

Description¶

sstabledump reads an SSTable file and outputs its contents in JSON format. This tool is essential for:

Debugging data issues - Inspect actual stored values
Analyzing tombstones - Find deletion markers causing issues
Examining partition structure - Understand data layout
Data recovery - Extract data from corrupted or orphaned SSTables
Schema change analysis - See how data was stored before schema changes

The output includes all rows, cells, tombstones, and TTL information stored in the SSTable.

Cassandra Must Be Stopped

For consistent results, Cassandra should be stopped before running sstabledump. The tool can run while Cassandra is active, but results may be inconsistent if compaction occurs.

How It Works¶

Arguments¶

Argument	Description
`sstable_file`	Path to the SSTable Data.db file

Options¶

Option	Description
`-d`	Output each row as a separate JSON object (JSONL format)
`-e`	Only output keys (no values)
`-k <key>`	Only output data for the specified partition key
`-x <key>`	Exclude the specified partition key
`-t`	Include only tombstones in output
`-l`	Limit output to first N partitions

Output Format¶

Standard JSON Output¶

[
  {
    "partition" : {
      "key" : [ "user123" ],
      "position" : 0
    },
    "rows" : [
      {
        "type" : "row",
        "position" : 48,
        "clustering" : [ "2024-01-15" ],
        "liveness_info" : { "tstamp" : "2024-01-15T10:30:00.000Z" },
        "cells" : [
          { "name" : "email", "value" : "[email protected]" },
          { "name" : "name", "value" : "John Doe" },
          { "name" : "status", "value" : "active" }
        ]
      }
    ]
  }
]

Row-per-Line Format (`-d`)¶

{"partition":{"key":["user123"],"position":0},"rows":[...]}
{"partition":{"key":["user456"],"position":512},"rows":[...]}
{"partition":{"key":["user789"],"position":1024},"rows":[...]}

Keys Only Format (`-e`)¶

[
  {
    "partition" : {
      "key" : [ "user123" ],
      "position" : 0
    }
  },
  {
    "partition" : {
      "key" : [ "user456" ],
      "position" : 512
    }
  }
]

Examples¶

Dump Entire SSTable¶

# Dump to stdout
sstabledump /var/lib/cassandra/data/my_keyspace/my_table-abc123/nb-1-big-Data.db

# Save to file
sstabledump /var/lib/cassandra/data/my_keyspace/my_table-abc123/nb-1-big-Data.db > dump.json

Dump Specific Partition¶

# Dump only data for partition key "user123"
sstabledump -k "user123" /path/to/sstable-Data.db

# For composite partition keys
sstabledump -k "region:us-east:tenant:acme" /path/to/sstable-Data.db

Dump Keys Only¶

# List all partition keys in SSTable
sstabledump -e /path/to/sstable-Data.db

# Count partitions
sstabledump -e /path/to/sstable-Data.db | grep -c '"partition"'

Dump Tombstones Only¶

# Find all tombstones (deletions) in SSTable
sstabledump -t /path/to/sstable-Data.db

Row-per-Line for Processing¶

# Output suitable for jq processing
sstabledump -d /path/to/sstable-Data.db | jq '.partition.key'

# Count rows per partition
sstabledump -d /path/to/sstable-Data.db | jq '.rows | length'

Limit Output Size¶

# Dump only first 10 partitions
sstabledump -l 10 /path/to/sstable-Data.db

Exclude Specific Partition¶

# Dump all except partition "problem_key"
sstabledump -x "problem_key" /path/to/sstable-Data.db

Common Use Cases¶

Finding Tombstones¶

#!/bin/bash
# find_tombstones.sh - Identify tables with tombstone issues

DATA_DIR="/var/lib/cassandra/data"
KEYSPACE="$1"
TABLE="$2"

for sstable in ${DATA_DIR}/${KEYSPACE}/${TABLE}-*/*-Data.db; do
    tombstone_count=$(sstabledump -t "$sstable" 2>/dev/null | grep -c '"type" : "range_tombstone"')
    if [ "$tombstone_count" -gt 0 ]; then
        echo "$sstable: $tombstone_count tombstones"
    fi
done

Analyzing Partition Sizes¶

#!/bin/bash
# partition_sizes.sh - Find large partitions

SSTABLE="$1"

sstabledump -d "$SSTABLE" | while read line; do
    key=$(echo "$line" | jq -r '.partition.key[0]')
    row_count=$(echo "$line" | jq '.rows | length')
    echo "$key: $row_count rows"
done | sort -t: -k2 -n -r | head -20

Data Recovery Script¶

#!/bin/bash
# recover_partition.sh - Extract specific partition data

SSTABLE="$1"
PARTITION_KEY="$2"
OUTPUT_FILE="$3"

echo "Extracting partition '$PARTITION_KEY' from $SSTABLE"
sstabledump -k "$PARTITION_KEY" "$SSTABLE" > "$OUTPUT_FILE"

if [ -s "$OUTPUT_FILE" ]; then
    echo "Data saved to $OUTPUT_FILE"
    rows=$(jq '.[0].rows | length' "$OUTPUT_FILE")
    echo "Found $rows rows"
else
    echo "No data found for partition key '$PARTITION_KEY'"
fi

Comparing SSTables¶

#!/bin/bash
# compare_sstables.sh - Compare partition keys between SSTables

SSTABLE1="$1"
SSTABLE2="$2"

# Extract keys from both
sstabledump -e "$SSTABLE1" | jq -r '.[].partition.key[0]' | sort > /tmp/keys1.txt
sstabledump -e "$SSTABLE2" | jq -r '.[].partition.key[0]' | sort > /tmp/keys2.txt

echo "Keys only in first SSTable:"
comm -23 /tmp/keys1.txt /tmp/keys2.txt

echo "Keys only in second SSTable:"
comm -13 /tmp/keys1.txt /tmp/keys2.txt

echo "Keys in both:"
comm -12 /tmp/keys1.txt /tmp/keys2.txt | wc -l

Understanding Output Fields¶

Partition Object¶

{
  "partition" : {
    "key" : [ "partition_key_value" ],
    "position" : 0,
    "deletion_info" : {              // Present if partition deleted
      "marked_deleted" : "timestamp",
      "local_delete_time" : "timestamp"
    }
  }
}

Row Object¶

{
  "type" : "row",
  "position" : 48,
  "clustering" : [ "clustering_value1", "clustering_value2" ],
  "liveness_info" : {
    "tstamp" : "2024-01-15T10:30:00.000Z",
    "ttl" : 86400,                   // TTL in seconds
    "expires_at" : "2024-01-16T10:30:00.000Z",
    "expired" : false
  },
  "deletion_info" : { ... },         // Present if row deleted
  "cells" : [ ... ]
}

Cell Object¶

{
  "name" : "column_name",
  "value" : "cell_value",
  "tstamp" : "2024-01-15T10:30:00.000Z",
  "ttl" : 86400,                     // If TTL set
  "expires_at" : "2024-01-16T10:30:00.000Z",
  "deletion_info" : { ... }          // If cell deleted
}

Tombstone Types¶

// Cell tombstone
{
  "name" : "column_name",
  "deletion_info" : {
    "marked_deleted" : "2024-01-15T10:30:00.000Z",
    "local_delete_time" : "2024-01-15T10:30:00.000Z"
  }
}

// Range tombstone
{
  "type" : "range_tombstone_bound",
  "start" : {
    "type" : "inclusive",
    "clustering" : [ "start_value" ]
  },
  "end" : {
    "type" : "inclusive",
    "clustering" : [ "end_value" ]
  },
  "deletion_info" : { ... }
}

// Partition tombstone
{
  "partition" : {
    "key" : [ "partition_key" ],
    "deletion_info" : {
      "marked_deleted" : "timestamp",
      "local_delete_time" : "timestamp"
    }
  }
}

Output Processing with jq¶

Extract All Partition Keys¶

sstabledump /path/to/sstable-Data.db | jq -r '.[].partition.key[0]'

Count Rows Per Partition¶

sstabledump /path/to/sstable-Data.db | jq '.[] | {key: .partition.key[0], rows: (.rows | length)}'

Find Expired TTL Data¶

sstabledump /path/to/sstable-Data.db | jq '.[].rows[].cells[] | select(.expired == true)'

Extract Specific Column Values¶

sstabledump /path/to/sstable-Data.db | jq '.[].rows[].cells[] | select(.name == "email") | .value'

Find Large Partitions¶

sstabledump /path/to/sstable-Data.db | jq '[.[] | {key: .partition.key[0], rows: (.rows | length)}] | sort_by(.rows) | reverse | .[0:10]'

Troubleshooting¶

Out of Memory¶

# Large SSTables can exhaust memory

# Option 1: Increase heap
export JVM_OPTS="-Xmx8G"
sstabledump /path/to/sstable-Data.db

# Option 2: Use streaming output
sstabledump -d /path/to/sstable-Data.db | head -100

# Option 3: Dump specific partition only
sstabledump -k "specific_key" /path/to/sstable-Data.db

Permission Denied¶

# Run as cassandra user
sudo -u cassandra sstabledump /var/lib/cassandra/data/.../nb-1-big-Data.db

"Unable to read SSTable" Error¶

# SSTable may be corrupted - verify first
sstableverify keyspace table

# If corrupted, cannot dump - scrub first
sstablescrub keyspace table

Composite Key Formatting¶

# For composite partition keys, use colon separator
# Schema: PRIMARY KEY ((region, tenant), date)
sstabledump -k "us-east:acme" /path/to/sstable-Data.db

# Check key format in output first
sstabledump -e -l 1 /path/to/sstable-Data.db

Performance Considerations¶

SSTable Size	Approximate Time	Memory Usage
100 MB	~5 seconds	~500 MB
1 GB	~30 seconds	~2 GB
10 GB	~5 minutes	~8 GB+
100 GB	Not recommended	Excessive

For large SSTables, use: - -k to dump specific partitions - -l to limit output - -d for streaming output - -e for keys only

Best Practices¶

sstabledump Guidelines

Use -d for large files - Enables streaming processing
Target specific partitions - Use -k when possible
Pipe to jq - Process JSON efficiently
Redirect to file - Avoid terminal buffer issues
Stop Cassandra - For consistent results
Check memory - Large SSTables need significant heap
Use for diagnostics - Not for bulk data export

Cautions

Output can be very large (gigabytes)
Memory usage scales with SSTable size
Sensitive data will be exposed in output
Not suitable for production data export

Command	Relationship
sstablemetadata	View SSTable statistics
sstablepartitions	Find large partitions
sstableexpiredblockers	Find tombstone blockers
sstableutil	List SSTable files
sstableverify	Verify SSTable before dump