sstableutil¶
Lists SSTable files belonging to a table, including active, temporary, and obsolete files.
Synopsis¶
sstableutil [options] <keyspace> <table>
Description¶
sstableutil provides a comprehensive listing of all SSTable files associated with a table. It identifies not only active SSTables but also temporary files from in-progress operations and obsolete files awaiting cleanup.
This tool is useful for:
- Identifying SSTable files before running other SSTable tools
- Diagnosing cleanup issues - Finding orphaned or obsolete files
- Understanding storage layout - Seeing all files for a table
- Pre-operation verification - Confirming SSTable paths before maintenance
Safe to Run While Cassandra Is Active
sstableutil can safely run while Cassandra is active. It performs read-only filesystem operations.
Arguments¶
| Argument | Description |
|---|---|
keyspace |
Name of the keyspace containing the table |
table |
Name of the table to list SSTables for |
Options¶
| Option | Description |
|---|---|
-c, --cleanup |
Only list obsolete files (cleanup candidates) |
-d, --debug |
Enable debug output |
-o, --oplog |
Include transaction log files |
-s, --snapshot <name> |
List SSTables in a specific snapshot |
-t, --type <type> |
Filter by SSTable type: tmp, final, all |
Examples¶
List All SSTables¶
# List all SSTable files for a table
sstableutil my_keyspace my_table
List Active SSTables Only¶
# Only show final (active) SSTables
sstableutil -t final my_keyspace my_table
List Temporary Files¶
# Show temporary files from in-progress operations
sstableutil -t tmp my_keyspace my_table
Find Obsolete Files¶
# List files that should be cleaned up
sstableutil --cleanup my_keyspace my_table
List Snapshot SSTables¶
# List SSTables in a specific snapshot
sstableutil --snapshot my_snapshot my_keyspace my_table
Include Transaction Logs¶
# Also show transaction log files
sstableutil --oplog my_keyspace my_table
Output Format¶
Standard Output¶
Listing files for my_keyspace.my_table
/var/lib/cassandra/data/my_keyspace/my_table-a1b2c3d4/nb-1-big-CompressionInfo.db
/var/lib/cassandra/data/my_keyspace/my_table-a1b2c3d4/nb-1-big-Data.db
/var/lib/cassandra/data/my_keyspace/my_table-a1b2c3d4/nb-1-big-Digest.crc32
/var/lib/cassandra/data/my_keyspace/my_table-a1b2c3d4/nb-1-big-Filter.db
/var/lib/cassandra/data/my_keyspace/my_table-a1b2c3d4/nb-1-big-Index.db
/var/lib/cassandra/data/my_keyspace/my_table-a1b2c3d4/nb-1-big-Statistics.db
/var/lib/cassandra/data/my_keyspace/my_table-a1b2c3d4/nb-1-big-Summary.db
/var/lib/cassandra/data/my_keyspace/my_table-a1b2c3d4/nb-1-big-TOC.txt
/var/lib/cassandra/data/my_keyspace/my_table-a1b2c3d4/nb-2-big-CompressionInfo.db
/var/lib/cassandra/data/my_keyspace/my_table-a1b2c3d4/nb-2-big-Data.db
...
SSTable Components¶
Each SSTable consists of multiple component files:
| Component | Extension | Description |
|---|---|---|
| Data | -Data.db |
Actual row data |
| Index | -Index.db |
Partition index |
| Filter | -Filter.db |
Bloom filter |
| Statistics | -Statistics.db |
SSTable metadata |
| Summary | -Summary.db |
Index summary |
| Compression | -CompressionInfo.db |
Compression metadata |
| Digest | -Digest.crc32 |
Data checksum |
| TOC | -TOC.txt |
Table of contents |
Understanding SSTable Names¶
Naming Convention¶
<version>-<generation>-<format>-<component>.<extension>
Example: nb-1-big-Data.db
│ │ │ │
│ │ │ └── Component type
│ │ └─────── Format (big = standard)
│ └────────── Generation number
└───────────── Version (nb = 4.0)
Version Codes¶
| Code | Cassandra Version |
|---|---|
| jb | 2.0.x |
| ka | 2.1.x |
| la | 2.2.x |
| ma | 3.0.x |
| mb | 3.11.x |
| nb | 4.0.x |
| nc | 4.1.x |
| oa | 5.0.x |
Common Use Cases¶
Find SSTable Paths for Other Tools¶
# Get Data.db paths for sstablemetadata
sstableutil my_keyspace my_table | grep "Data.db$"
# Use with sstablemetadata
for f in $(sstableutil my_keyspace my_table | grep "Data.db$"); do
sstablemetadata "$f"
done
Count SSTables¶
# Count number of SSTables for a table
sstableutil my_keyspace my_table | grep -c "Data.db$"
Calculate Total Size¶
#!/bin/bash
# sstable_size.sh - Calculate total SSTable size
KEYSPACE="$1"
TABLE="$2"
total_size=0
for f in $(sstableutil "$KEYSPACE" "$TABLE"); do
size=$(stat -c%s "$f" 2>/dev/null)
total_size=$((total_size + size))
done
echo "Total size: $((total_size / 1024 / 1024)) MB"
Find Orphaned Files¶
#!/bin/bash
# find_orphaned.sh - Find files not tracked by Cassandra
KEYSPACE="$1"
TABLE="$2"
DATA_DIR="/var/lib/cassandra/data"
# Get tracked files
tracked=$(sstableutil "$KEYSPACE" "$TABLE" | sort)
# Get all files in directory
all_files=$(find ${DATA_DIR}/${KEYSPACE}/${TABLE}-*/ -type f ! -name "*.log" | sort)
echo "Files not tracked by Cassandra:"
comm -13 <(echo "$tracked") <(echo "$all_files")
Cleanup Verification¶
#!/bin/bash
# verify_cleanup.sh - Check for files needing cleanup
KEYSPACE="$1"
TABLE="$2"
echo "Checking for obsolete files in ${KEYSPACE}.${TABLE}..."
obsolete=$(sstableutil --cleanup "$KEYSPACE" "$TABLE")
if [ -z "$obsolete" ]; then
echo "No obsolete files found"
else
echo "Obsolete files found:"
echo "$obsolete"
echo ""
echo "Run 'nodetool cleanup' or wait for automatic cleanup"
fi
Snapshot Verification¶
#!/bin/bash
# verify_snapshot.sh - Verify snapshot contents
KEYSPACE="$1"
TABLE="$2"
SNAPSHOT="$3"
echo "SSTables in snapshot '$SNAPSHOT' for ${KEYSPACE}.${TABLE}:"
sstableutil --snapshot "$SNAPSHOT" "$KEYSPACE" "$TABLE" | grep "Data.db$"
count=$(sstableutil --snapshot "$SNAPSHOT" "$KEYSPACE" "$TABLE" | grep -c "Data.db$")
echo ""
echo "Total SSTables in snapshot: $count"
Pre-Tool Verification¶
#!/bin/bash
# pre_tool_check.sh - Verify SSTables before running offline tools
KEYSPACE="$1"
TABLE="$2"
echo "SSTable verification for ${KEYSPACE}.${TABLE}"
echo "============================================="
# Check for active SSTables
echo ""
echo "Active SSTables:"
sstableutil -t final "$KEYSPACE" "$TABLE" | grep "Data.db$" | wc -l
# Check for temporary files
echo ""
echo "Temporary files:"
tmp_count=$(sstableutil -t tmp "$KEYSPACE" "$TABLE" | wc -l)
if [ "$tmp_count" -gt 0 ]; then
echo "WARNING: $tmp_count temporary files found"
echo "An operation may be in progress"
sstableutil -t tmp "$KEYSPACE" "$TABLE"
else
echo "None"
fi
# Check for obsolete files
echo ""
echo "Obsolete files:"
obsolete_count=$(sstableutil --cleanup "$KEYSPACE" "$TABLE" | wc -l)
if [ "$obsolete_count" -gt 0 ]; then
echo "WARNING: $obsolete_count obsolete files found"
sstableutil --cleanup "$KEYSPACE" "$TABLE"
else
echo "None"
fi
SSTable File Types¶
Final (Active) SSTables¶
Active SSTables currently being used by Cassandra:
sstableutil -t final my_keyspace my_table
Temporary Files¶
Files from in-progress operations (compaction, streaming):
sstableutil -t tmp my_keyspace my_table
Temporary files may indicate: - Active compaction - Active streaming - Incomplete operation (if Cassandra crashed)
Obsolete Files¶
Files marked for deletion but not yet removed:
sstableutil --cleanup my_keyspace my_table
Obsolete files are cleaned up:
- Automatically after compaction
- When nodetool cleanup is run
- After repair operations
Transaction Logs¶
Transaction logs track pending SSTable operations:
sstableutil --oplog my_keyspace my_table
Transaction Log Files¶
| File Pattern | Purpose |
|---|---|
*.log |
Pending operation log |
*_txn_*.log |
Transaction in progress |
Transaction logs ensure atomicity of SSTable operations. They should be automatically cleaned up after operations complete.
Troubleshooting¶
No SSTables Found¶
# Check if table exists
cqlsh -e "DESCRIBE TABLE my_keyspace.my_table;"
# Check data directory
ls -la /var/lib/cassandra/data/my_keyspace/
# Table may be empty - check with flush first
nodetool flush my_keyspace my_table
sstableutil my_keyspace my_table
Permission Denied¶
# Run as cassandra user
sudo -u cassandra sstableutil my_keyspace my_table
# Or check permissions
ls -la /var/lib/cassandra/data/my_keyspace/my_table-*/
Stale Temporary Files¶
If temporary files persist after Cassandra restart:
# Check for stale temp files
sstableutil -t tmp my_keyspace my_table
# If Cassandra is stopped and these are from crashed operations:
# They can usually be safely deleted
# But verify Cassandra is stopped first!
Best Practices¶
sstableutil Guidelines
- Use before other tools - Verify paths before running offline tools
- Check for temp files - Before maintenance, ensure no operations in progress
- Regular cleanup checks - Monitor for accumulating obsolete files
- Verify snapshots - Confirm snapshot contents before restores
- Script integration - Use in automation scripts for path discovery
- Safe operation - Can run while Cassandra is active
Safe Operation
sstableutil is completely read-only and safe to run at any time, whether Cassandra is running or not.
Related Commands¶
| Command | Relationship |
|---|---|
| sstablemetadata | Get metadata for listed SSTables |
| sstabledump | Dump data from listed SSTables |
| sstableverify | Verify listed SSTables |
| nodetool cleanup | Remove obsolete files |
| nodetool compact | Compact SSTables |