nodetool snapshot¶
Creates a snapshot (hard-link backup) of one or more tables.
Synopsis¶
nodetool [connection_options] snapshot [options] [--] [keyspace ...]
Description¶
nodetool snapshot creates a point-in-time copy of SSTable files using filesystem hard links. Snapshots are instantaneous, require minimal additional disk space initially, and serve as the foundation for Cassandra backups.
Why Snapshots Matter¶
Cassandra continuously modifies data through compaction, which merges and deletes SSTable files. Without snapshots, there is no way to recover data from a specific point in time. Snapshots freeze a consistent view of the data that can be:
- Restored locally if data is accidentally deleted or corrupted
- Copied off-node for disaster recovery
- Used for cloning to create test environments from production data
What Snapshots Capture¶
| Captured | Not Captured |
|---|---|
| All committed SSTable data | Uncommitted data in memtables (unless flushed first) |
| Table schema definitions | Commit log files |
| Secondary index data | Configuration files |
| Materialized view data | System keyspace data (unless explicitly included) |
Arguments¶
| Argument | Description |
|---|---|
keyspace |
Keyspace(s) to snapshot. If omitted, snapshots all keyspaces |
Options¶
| Option | Description |
|---|---|
-t, --tag |
Name/tag for the snapshot |
-cf, --column-family |
Table(s) to snapshot (comma-separated) |
-sf, --skip-flush |
Skip flushing memtables before snapshot |
-kt, --kt-list |
List of keyspace.table to snapshot |
--ttl |
Time-to-live for snapshot (auto-deletion) |
Examples¶
Snapshot All Keyspaces¶
nodetool snapshot -t full_backup_20240115
Creates snapshot of all user keyspaces.
Snapshot Specific Keyspace¶
nodetool snapshot -t my_backup my_keyspace
Snapshot Specific Table¶
nodetool snapshot -t users_backup -cf users my_keyspace
Snapshot Multiple Tables¶
nodetool snapshot -t tables_backup -kt my_keyspace.users,my_keyspace.orders
Snapshot with TTL (Auto-Delete)¶
nodetool snapshot -t temp_backup --ttl 24h my_keyspace
Snapshot TTL
Available in Cassandra 4.0+. The snapshot automatically deletes after the specified duration.
Format: <number><unit> where unit is s (seconds), m (minutes), h (hours), d (days).
Snapshot Location¶
Snapshots are stored within each table's data directory:
/var/lib/cassandra/data/<keyspace>/<table>-<uuid>/snapshots/<tag>/
Example:
/var/lib/cassandra/data/my_keyspace/users-a1b2c3d4/snapshots/my_backup/
├── nb-1-big-Data.db
├── nb-1-big-Index.db
├── nb-1-big-Filter.db
├── nb-1-big-CompressionInfo.db
├── nb-1-big-Statistics.db
├── nb-1-big-Digest.crc32
├── nb-1-big-TOC.txt
└── manifest.json
When to Use¶
Before Destructive Operations¶
Always Snapshot First
Take snapshots before:
- Schema changes (ALTER TABLE, DROP)
- Bulk deletes
- Major compaction
- Version upgrades
- Data migrations
# Before dropping a column
nodetool snapshot -t before_schema_change my_keyspace
ALTER TABLE my_keyspace.users DROP old_column;
Regular Backups¶
# Daily backup with date tag
nodetool snapshot -t daily_$(date +%Y%m%d) my_keyspace
Before Upgrades¶
nodetool snapshot -t pre_upgrade_4.1
When NOT to Use¶
Without Flushing First¶
Flush Before Snapshot
By default, nodetool snapshot flushes memtables first. If using -sf (skip flush), recent writes will NOT be included:
# WRONG - May miss recent data
nodetool snapshot -sf -t my_backup
# CORRECT - Ensures all data is captured
nodetool flush my_keyspace
nodetool snapshot -t my_backup my_keyspace
Relying Solely on Snapshots¶
Snapshots Are Not Complete Backups
Snapshots alone are insufficient:
- Only exist on local node
- Lost if disk fails
- Don't include commit logs
Use snapshots as part of a complete backup strategy that copies data off-node.
How Snapshots Work¶
Understanding Hard Links¶
A hard link is a filesystem feature that allows multiple directory entries to point to the same physical data on disk. Unlike a copy, a hard link does not duplicate the data—it creates another reference to the existing data blocks.
┌─────────────────────────────────────────────────────────────────┐
│ DISK STORAGE │
│ ┌─────────────────────────────────────────────────────────┐ │
│ │ Actual Data Blocks (100 MB) │ │
│ │ [SSTable file contents] │ │
│ └─────────────────────────────────────────────────────────┘ │
│ ▲ ▲ │
│ │ │ │
│ Reference 1 Reference 2 │
│ │ │ │
│ ┌─────────────────┴───┐ ┌──────────┴─────────────────┐ │
│ │ data/ks/tbl/ │ │ data/ks/tbl/snapshots/ │ │
│ │ nb-1-big-Data.db │ │ backup/nb-1-big-Data.db │ │
│ │ (original file) │ │ (hard link - same data) │ │
│ └─────────────────────┘ └────────────────────────────┘ │
└─────────────────────────────────────────────────────────────────┘
Total disk usage: 100 MB (not 200 MB)
Key properties of hard links:
| Property | Behavior |
|---|---|
| Creation speed | Instantaneous (just adds a directory entry) |
| Initial space | Zero additional space (same data blocks) |
| Data persistence | Data remains until ALL references are deleted |
| Independence | Each link is equal; there is no "original" vs "copy" |
This is why snapshots are fast and space-efficient: creating a snapshot of 500 GB of data takes seconds and uses no additional disk space at the moment of creation.
Space Usage Over Time¶
The space efficiency of hard links has a time dimension. When compaction runs, Cassandra deletes the original SSTable files—but the snapshot's hard links keep the data alive:
| Time | Event | Disk State | Space Impact |
|---|---|---|---|
| T=0 | Snapshot created | Both original and snapshot reference same blocks | +0 MB |
| T+1 | Compaction merges SSTables | Original files deleted, snapshot links remain | +0 MB |
| T+2 | Data blocks now only referenced by snapshot | Snapshot "owns" the data exclusively | Full size now attributed to snapshot |
Example timeline:
Day 1: Take snapshot of 100 GB table
- Snapshot size shown: ~0 (hard links to active SSTables)
Day 3: Compaction runs, creates new SSTables, deletes old ones
- Snapshot size shown: 100 GB (now holds exclusive references)
- Active table size: 95 GB (new compacted SSTables)
- Total disk usage: 195 GB
Snapshot Space Growth
Old snapshots accumulate disk space as compaction removes the original SSTables they reference. A week-old snapshot may consume as much space as the original data. Monitor with nodetool listsnapshots and clean up regularly.
Disk Space Management¶
Check Snapshot Sizes¶
nodetool listsnapshots
Output:
Snapshot name Keyspace name Column family name True size Size on disk
my_backup my_keyspace users 1.5 GB 1.5 GB
my_backup my_keyspace orders 2.3 GB 2.3 GB
old_backup my_keyspace users 1.2 GB 1.2 GB
Check via tablestats¶
nodetool tablestats my_keyspace.users | grep "Space used by snapshots"
Manual Space Check¶
du -sh /var/lib/cassandra/data/*/*/snapshots/*
Complete Backup Workflow¶
Step 1: Flush Memtables¶
nodetool flush my_keyspace
Step 2: Create Snapshot¶
nodetool snapshot -t backup_$(date +%Y%m%d_%H%M%S) my_keyspace
Step 3: Copy Off-Node¶
# Find snapshot files
find /var/lib/cassandra/data/my_keyspace -path "*/snapshots/backup_*" -type f
# Copy to backup location
rsync -av /var/lib/cassandra/data/my_keyspace/*/snapshots/backup_*/ /backup/location/
Step 4: Clean Up Local Snapshot¶
nodetool clearsnapshot -t backup_20240115_120000 my_keyspace
Snapshot for Schema Backup¶
Snapshots include schema.cql file containing the table definition:
cat /var/lib/cassandra/data/my_keyspace/users-*/snapshots/my_backup/schema.cql
Common Issues¶
"Snapshot already exists"¶
ERROR: Snapshot my_backup already exists
Solutions:
- Use a different tag name
- Clear existing snapshot: nodetool clearsnapshot -t my_backup
Snapshot Takes Too Long¶
If snapshot is slow, it's likely waiting for flush:
# Check flush activity
nodetool tpstats | grep -i flush
# Use skip-flush if memtables already flushed
nodetool flush my_keyspace
nodetool snapshot -sf -t my_backup my_keyspace
Disk Space Full¶
Snapshots may prevent space reclamation after compaction:
# Check snapshot sizes
nodetool listsnapshots
# Clear old snapshots
nodetool clearsnapshot -t old_backup
Best Practices¶
Snapshot Guidelines
- Use meaningful tags - Include date and purpose
- Flush first - Unless using skip-flush intentionally
- Copy off-node - Snapshots don't protect against disk failure
- Clean up regularly - Remove old snapshots to reclaim space
- Document retention - Define how long to keep snapshots
- Automate - Script snapshot creation and cleanup
Naming Convention¶
# Include date, time, and purpose
nodetool snapshot -t pre_upgrade_20240115_1430
nodetool snapshot -t daily_backup_20240115
nodetool snapshot -t before_schema_change_users_20240115
Restoring from Snapshots¶
Snapshots can be restored in two ways:
Local Restore (Same Node, Same Schema)¶
Copy snapshot files back to the table's data directory and refresh:
# 1. Stop writes to the table (optional but recommended)
# 2. Copy snapshot files to table directory
cp /var/lib/cassandra/data/my_keyspace/users-*/snapshots/my_backup/*.db \
/var/lib/cassandra/data/my_keyspace/users-*/
# 3. Refresh to load the restored files
nodetool refresh my_keyspace users
Cross-Node Restore (Different Topology)¶
Use sstableloader to stream snapshot data to any cluster:
sstableloader -d node1,node2,node3 \
/backup/my_keyspace/users-*/snapshots/my_backup/
For complete restore procedures, see Backup and Restore.
Related Commands¶
| Command | Relationship |
|---|---|
| clearsnapshot | Remove snapshots |
| listsnapshots | List existing snapshots |
| flush | Flush memtables before snapshot |
| tablestats | Check snapshot space usage |
| refresh | Load restored SSTable files |
Related Documentation¶
- Backup and Restore Overview - Complete backup strategies
- Restore Procedures - Detailed restore scenarios