CQL Data Types¶

Choosing the right data type affects storage efficiency, query performance, and available operations. BIGINT vs INT might not matter for small datasets, but at scale the 4 bytes per row add up. TIMEUUID vs UUID determines whether time-range queries are possible. Frozen collections serialize as blobs; unfrozen ones allow partial updates.

CQL's type system covers the basics (integers, text, timestamps), identifiers (UUID, TIMEUUID for time-ordered unique IDs), collections (lists, sets, maps), and complex types (user-defined types, tuples). Cassandra 5.0 added VECTOR for embedding storage.

This reference covers each type, when to use it, and the trade-offs involved.

Behavioral Guarantees¶

What Data Types Guarantee¶

Values are validated against type constraints at write time
Type conversion failures are reported immediately (no silent truncation)
Integer overflow causes write rejection (not wrap-around)
TEXT stores UTF-8 encoded data; invalid UTF-8 is rejected
TIMEUUID values contain embedded timestamps extractable via toTimestamp()
FROZEN collections are serialized atomically and compared byte-for-byte

What Data Types Do NOT Guarantee¶

Undefined Behavior

The following behaviors are undefined and must not be relied upon:

Floating point precision: FLOAT and DOUBLE are approximate; exact equality comparisons may fail
Collection ordering across versions: Internal sort order for sets and map keys may vary between Cassandra versions
VARINT/DECIMAL performance: Arbitrary precision types have higher computation and storage costs
Empty vs null: Empty string ('') and empty collections are distinct from null, but behavior may vary
Large value handling: Very large text or blob values (approaching 2GB) may cause memory pressure

Type Comparison Contract¶

Type	Comparison Semantics
Numeric (int, bigint, etc.)	Numerical ordering
TEXT, VARCHAR	UTF-8 byte ordering
ASCII	ASCII byte ordering
TIMESTAMP	Chronological ordering (milliseconds since epoch)
TIMEUUID	Chronological ordering (time component)
UUID	Lexicographical ordering
BLOB	Byte ordering
BOOLEAN	false < true
Collections (frozen)	Element-by-element comparison

Null Handling Contract¶

Operation	Behavior
INSERT with null	Creates tombstone for column
NULL in primary key	Rejected (invalid)
NULL in clustering column	Rejected (invalid)
NULL in collection element	Behavior varies by collection type
Comparison with NULL	NULL is not equal to NULL

Storage Size Contract¶

Type	Fixed/Variable	Size
TINYINT	Fixed	1 byte
SMALLINT	Fixed	2 bytes
INT	Fixed	4 bytes
BIGINT	Fixed	8 bytes
FLOAT	Fixed	4 bytes
DOUBLE	Fixed	8 bytes
UUID/TIMEUUID	Fixed	16 bytes
TEXT, BLOB	Variable	Length prefix + data
VARINT, DECIMAL	Variable	Value-dependent
Collections	Variable	Element count + elements

Version-Specific Behavior¶

Version	Behavior
2.0+	Collections (list, set, map)
2.1+	User-defined types (UDTs), tuple type
2.2+	TINYINT, SMALLINT, DATE, TIME types
3.0+	Non-frozen UDTs and collections
4.0+	DURATION type
5.0+	VECTOR type for ML embeddings (CEP-30)

Type Categories¶

Category	Types	Use Case
Numeric	int, bigint, smallint, tinyint, varint, float, double, decimal	Numbers and calculations
Text	text, varchar, ascii	String data
Temporal	timestamp, date, time, duration	Date and time
Identifiers	uuid, timeuuid	Unique identifiers
Binary	blob	Raw bytes
Boolean	boolean	True/false
Network	inet	IP addresses
Collections	list, set, map	Multiple values
Complex	tuple, UDT, frozen	Structured data
Vector	vector	ML embeddings (5.0+)

Numeric Types¶

Integer Types¶

Type	Size	Range	Example
`TINYINT`	1 byte	-128 to 127	`127`
`SMALLINT`	2 bytes	-32,768 to 32,767	`32767`
`INT`	4 bytes	-2³¹ to 2³¹-1	`2147483647`
`BIGINT`	8 bytes	-2⁶³ to 2⁶³-1	`9223372036854775807`
`VARINT`	Variable	Arbitrary precision	`123456789012345678901234567890`

CREATE TABLE metrics (
    id UUID PRIMARY KEY,
    count_small TINYINT,
    count_medium INT,
    count_large BIGINT,
    count_huge VARINT
);

INSERT INTO metrics (id, count_small, count_medium, count_large, count_huge)
VALUES (uuid(), 100, 1000000, 9223372036854775807, 123456789012345678901234567890);

Floating Point Types¶

Type	Size	Precision	Example
`FLOAT`	4 bytes	~7 digits	`3.14159`
`DOUBLE`	8 bytes	~15 digits	`3.141592653589793`
`DECIMAL`	Variable	Arbitrary	`123.456789012345678901234567890`

CREATE TABLE measurements (
    id UUID PRIMARY KEY,
    temperature FLOAT,
    precise_value DOUBLE,
    currency DECIMAL
);

INSERT INTO measurements (id, temperature, precise_value, currency)
VALUES (uuid(), 23.5, 3.141592653589793, 1234567.89);

Best Practices: - Use INT/BIGINT for counters and IDs - Use DECIMAL for financial data (exact precision) - Use DOUBLE for scientific calculations - Avoid FLOAT unless space is critical

Text Types¶

Type	Description	Max Size
`TEXT`	UTF-8 encoded string	2GB
`VARCHAR`	Alias for TEXT	2GB
`ASCII`	US-ASCII string	2GB

CREATE TABLE users (
    user_id UUID PRIMARY KEY,
    username TEXT,
    bio TEXT,
    legacy_code ASCII
);

-- Text supports Unicode
INSERT INTO users (user_id, username, bio)
VALUES (uuid(), '日本語ユーザー', 'こんにちは世界 🌍');

Best Practices: - Use TEXT for all string data (UTF-8 support) - Only use ASCII for legacy compatibility - Don't store large text in frequently queried columns

Temporal Types¶

TIMESTAMP¶

Milliseconds since Unix epoch (January 1, 1970 UTC).

CREATE TABLE events (
    event_id UUID PRIMARY KEY,
    event_time TIMESTAMP
);

-- Various timestamp formats
INSERT INTO events (event_id, event_time) VALUES
    (uuid(), '2024-01-15 10:30:00'),
    (uuid(), '2024-01-15 10:30:00+0000'),
    (uuid(), '2024-01-15T10:30:00Z'),
    (uuid(), 1705315800000);  -- Milliseconds

-- Using functions
INSERT INTO events (event_id, event_time)
VALUES (uuid(), toTimestamp(now()));

DATE¶

Calendar date without time component.

CREATE TABLE daily_stats (
    stat_date DATE PRIMARY KEY,
    count INT
);

INSERT INTO daily_stats (stat_date, count)
VALUES ('2024-01-15', 100);

-- Using function
INSERT INTO daily_stats (stat_date, count)
VALUES (toDate(now()), 200);

TIME¶

Time of day in nanoseconds since midnight.

CREATE TABLE schedules (
    id UUID PRIMARY KEY,
    start_time TIME,
    end_time TIME
);

INSERT INTO schedules (id, start_time, end_time)
VALUES (uuid(), '09:00:00', '17:30:00.123456789');

DURATION¶

A period of time (months, days, nanoseconds).

CREATE TABLE subscriptions (
    user_id UUID PRIMARY KEY,
    duration DURATION
);

-- Various formats
INSERT INTO subscriptions (user_id, duration) VALUES
    (uuid(), '1y2mo'),           -- 1 year, 2 months
    (uuid(), '3w4d'),            -- 3 weeks, 4 days
    (uuid(), '12h30m'),          -- 12 hours, 30 minutes
    (uuid(), '89h4m48s');        -- 89 hours, 4 minutes, 48 seconds

-- ISO 8601 format
INSERT INTO subscriptions (user_id, duration)
VALUES (uuid(), 'P1Y2M3DT4H5M6S');  -- 1 year, 2 months, 3 days, 4 hours, 5 minutes, 6 seconds

Note: DURATION cannot be used in primary keys.

Identifier Types¶

UUID¶

128-bit universally unique identifier.

CREATE TABLE items (
    item_id UUID PRIMARY KEY,
    name TEXT
);

-- Generate random UUID
INSERT INTO items (item_id, name)
VALUES (uuid(), 'Widget');

-- Use specific UUID
INSERT INTO items (item_id, name)
VALUES (550e8400-e29b-41d4-a716-446655440000, 'Gadget');

TIMEUUID¶

Time-based UUID (version 1) that includes timestamp.

CREATE TABLE events (
    partition_id TEXT,
    event_id TIMEUUID,
    data TEXT,
    PRIMARY KEY ((partition_id), event_id)
) WITH CLUSTERING ORDER BY (event_id DESC);

-- Generate time-based UUID
INSERT INTO events (partition_id, event_id, data)
VALUES ('partition1', now(), 'event data');

-- Extract timestamp from timeuuid
SELECT dateOf(event_id), unixTimestampOf(event_id), data FROM events;

-- Query by time range using timeuuid
SELECT * FROM events
WHERE partition_id = 'partition1'
  AND event_id >= minTimeuuid('2024-01-01 00:00:00')
  AND event_id < maxTimeuuid('2024-01-02 00:00:00');

When to use: - UUID: Random unique identifiers - TIMEUUID: When time-ordering and uniqueness are needed

Binary Type¶

BLOB¶

Binary Large Object - arbitrary bytes.

CREATE TABLE files (
    file_id UUID PRIMARY KEY,
    file_name TEXT,
    content BLOB
);

-- Insert hex-encoded binary
INSERT INTO files (file_id, file_name, content)
VALUES (uuid(), 'test.bin', 0x48656c6c6f20576f726c64);

-- Insert using textAsBlob
INSERT INTO files (file_id, file_name, content)
VALUES (uuid(), 'hello.txt', textAsBlob('Hello World'));

Functions: - textAsBlob(text) - Convert text to blob - blobAsText(blob) - Convert blob to text - intAsBlob(int) / blobAsInt(blob) - Integer conversions

Boolean Type¶

CREATE TABLE settings (
    user_id UUID PRIMARY KEY,
    notifications_enabled BOOLEAN,
    dark_mode BOOLEAN
);

INSERT INTO settings (user_id, notifications_enabled, dark_mode)
VALUES (uuid(), true, false);

Network Type¶

INET¶

IPv4 or IPv6 address.

CREATE TABLE access_logs (
    log_id TIMEUUID PRIMARY KEY,
    client_ip INET,
    request TEXT
);

INSERT INTO access_logs (log_id, client_ip, request) VALUES
    (now(), '192.168.1.100', 'GET /api/users'),
    (now(), '2001:0db8:85a3:0000:0000:8a2e:0370:7334', 'POST /api/data');

Collection Types¶

Collections store multiple values in a single column. Cassandra provides three collection types: LIST, SET, and MAP. Collections are designed for small, bounded datasets that are typically read together with the row.

Collection Architecture¶

Collections are stored as a set of cells within the row, not as a separate data structure:

Row Storage:
┌──────────────┬─────────────────────────────────────────────────┐
│ Primary Key  │ Cells                                           │
├──────────────┼─────────────────────────────────────────────────┤
│ user_id=123  │ name="Alice" │ tags[0]="a" │ tags[1]="b" │ ... │
└──────────────┴─────────────────────────────────────────────────┘

Each collection element is stored as an individual cell with its own timestamp. This enables:

Partial updates without reading the entire collection
Per-element TTL
Last-write-wins conflict resolution at element level

Collections Are Not Tables

Collections are optimized for small datasets (tens to hundreds of elements). For large or unbounded data, use a separate table with a clustering column instead of a collection.

When to Use Collections¶

Use Case	Recommended Type	Example
Ordered items with duplicates	LIST	Recent search queries, action history
Unique tags or categories	SET	User roles, product tags, permissions
Key-value properties	MAP	User preferences, metadata, attributes
Large or unbounded data	Separate table	Order line items, comments, events

LIST¶

Ordered collection that maintains insertion order and allows duplicate values. Elements are accessed by index position.

Characteristics:

Maintains insertion order
Allows duplicate values
Index-based access (0-indexed)
Append and prepend operations
Read-before-write required for index updates

CREATE TABLE users (
    user_id UUID PRIMARY KEY,
    phone_numbers LIST<TEXT>,
    recent_searches LIST<TEXT>
);

-- Insert list
INSERT INTO users (user_id, phone_numbers, recent_searches)
VALUES (uuid(), ['+1-555-0100', '+1-555-0101'], ['cassandra', 'database']);

-- Append to end (efficient)
UPDATE users SET phone_numbers = phone_numbers + ['+1-555-0102']
WHERE user_id = ?;

-- Prepend to beginning (efficient)
UPDATE users SET recent_searches = ['nosql'] + recent_searches
WHERE user_id = ?;

-- Remove by value (removes all occurrences)
UPDATE users SET recent_searches = recent_searches - ['database']
WHERE user_id = ?;

-- Update by index (requires read-before-write internally)
UPDATE users SET phone_numbers[0] = '+1-555-9999'
WHERE user_id = ?;

-- Delete by index
DELETE phone_numbers[1] FROM users WHERE user_id = ?;

LIST Anti-Patterns

Index-based updates: list[i] = value requires reading the list first, creating a race condition under concurrent writes
Large lists: Reading a list reads all elements; large lists cause latency spikes
Frequent middle insertions: No efficient way to insert at arbitrary positions

When to Use LIST:

Small, bounded ordered sequences (< 100 elements)
Append/prepend-only patterns (logs, history)
When duplicates are meaningful
When order matters

SET¶

Unordered collection of unique elements. Elements are stored in sorted order internally (by element value).

Characteristics:

Unique elements only (duplicates ignored)
Sorted internally by element value
Efficient add/remove operations
No index-based access
Idempotent additions

CREATE TABLE articles (
    article_id UUID PRIMARY KEY,
    tags SET<TEXT>,
    liked_by SET<UUID>
);

-- Insert set
INSERT INTO articles (article_id, tags)
VALUES (uuid(), {'technology', 'programming', 'cassandra'});

-- Add elements (idempotent - adding existing element has no effect)
UPDATE articles SET tags = tags + {'database', 'nosql'}
WHERE article_id = ?;

-- Remove elements
UPDATE articles SET tags = tags - {'programming'}
WHERE article_id = ?;

-- Check membership with CONTAINS (requires index or ALLOW FILTERING)
SELECT * FROM articles WHERE tags CONTAINS 'cassandra' ALLOW FILTERING;

-- With secondary index
CREATE INDEX ON articles (tags);
SELECT * FROM articles WHERE tags CONTAINS 'cassandra';

When to Use SET:

Tags, categories, labels
Many-to-many relationships (small cardinality)
Permissions or roles
When uniqueness is required
When order doesn't matter

MAP¶

Collection of key-value pairs. Keys must be unique; values can be any type.

Characteristics:

Unique keys
Keys sorted internally
Direct key-based access
Efficient single-key updates
Keys and values can be different types

CREATE TABLE user_profiles (
    user_id UUID PRIMARY KEY,
    preferences MAP<TEXT, TEXT>,
    scores MAP<TEXT, INT>,
    metadata MAP<TEXT, FROZEN<some_udt>>
);

-- Insert map
INSERT INTO user_profiles (user_id, preferences, scores)
VALUES (uuid(),
    {'theme': 'dark', 'language': 'en', 'timezone': 'UTC'},
    {'level': 5, 'points': 1000, 'achievements': 42});

-- Update specific key (efficient, no read required)
UPDATE user_profiles SET preferences['theme'] = 'light'
WHERE user_id = ?;

-- Add multiple keys
UPDATE user_profiles SET preferences = preferences + {'notifications': 'on', 'beta': 'true'}
WHERE user_id = ?;

-- Remove specific key
DELETE preferences['timezone'] FROM user_profiles
WHERE user_id = ?;

-- Remove multiple keys
UPDATE user_profiles SET preferences = preferences - {'beta', 'notifications'}
WHERE user_id = ?;

-- Query by key (requires index or ALLOW FILTERING)
CREATE INDEX ON user_profiles (KEYS(preferences));
SELECT * FROM user_profiles WHERE preferences CONTAINS KEY 'theme';

-- Query by value
CREATE INDEX ON user_profiles (VALUES(preferences));
SELECT * FROM user_profiles WHERE preferences CONTAINS 'dark';

-- Query by entry (key-value pair)
CREATE INDEX ON user_profiles (ENTRIES(preferences));
SELECT * FROM user_profiles WHERE preferences['theme'] = 'dark';

When to Use MAP:

Configuration or preferences
Dynamic attributes (schema-less properties)
Counters by category
Localized content (language → text)
Metadata storage

Collection Storage and Performance¶

Cell-Per-Element Storage¶

Each collection element is stored as a separate cell:

Collection	Cell Key	Cell Value
`LIST<TEXT>`	UUID (timeuuid)	Element value
`SET<TEXT>`	Element value	Empty
`MAP<K,V>`	Key	Value

This means:

Reading: All elements are read together (no partial reads)
Writing: Individual elements can be added/removed without reading
Size: Collection overhead scales with element count

Performance Implications¶

Operation	LIST	SET	MAP
Read entire collection	O(n)	O(n)	O(n)
Append/prepend	O(1)	N/A	N/A
Add element	O(1)	O(1)	O(1)
Remove by value	O(n)	O(1)	O(1)
Update by index	O(n)*	N/A	N/A
Update by key	N/A	N/A	O(1)

*Requires read-before-write

Collection Restrictions¶

Hard Limits

Limit	Value	Notes
Maximum element size	64 KB	Per individual element
Maximum elements	2 billion	Practical limit much lower
Maximum collection size	2 GB	Total serialized size
Primary key usage	Frozen only	Non-frozen cannot be in PK

Practical Limits

Recommended max elements: ~100 for optimal performance
Query impact: Large collections cause GC pressure and latency
Tombstones: Removing elements creates tombstones until compaction
No pagination: Cannot partially read a collection

Operations Not Supported¶

No sorting: Cannot ORDER BY collection contents
No aggregation: Cannot SUM/AVG collection elements
No filtering on non-indexed: CONTAINS requires index or ALLOW FILTERING
No atomic pop: Cannot atomically read-and-remove
No intersection/union: Set operations must be done client-side

Frozen Collections¶

The FROZEN modifier serializes the entire collection as a single blob value.

-- Non-frozen (default): individual element updates possible
tags SET<TEXT>

-- Frozen: entire collection replaced on update
tags FROZEN<SET<TEXT>>

Aspect	Non-Frozen	Frozen
Element updates	Individual	Replace entire collection
Storage	Cell per element	Single blob
Primary key	Not allowed	Allowed
Nested collections	Not allowed	Required for nesting
Equality comparison	Not supported	Supported

-- Frozen required for primary key
CREATE TABLE user_groups (
    members FROZEN<SET<UUID>> PRIMARY KEY,
    group_name TEXT
);

-- Frozen required for nested collections
CREATE TABLE complex_data (
    id UUID PRIMARY KEY,
    matrix LIST<FROZEN<LIST<INT>>>,
    nested_map MAP<TEXT, FROZEN<MAP<TEXT, INT>>>
);

When to Use Frozen

Collection is part of primary key
Nested collections (always required)
Collection is always replaced entirely
Need equality comparison (WHERE col = {frozen_value})

Collection Anti-Patterns¶

Avoid These Patterns

Unbounded collections

-- BAD: followers can grow infinitely
followers SET<UUID>

-- GOOD: use a separate table
CREATE TABLE followers (
    user_id UUID,
    follower_id UUID,
    PRIMARY KEY ((user_id), follower_id)
);

Large collections

-- BAD: storing all order items in collection
items LIST<FROZEN<order_item>>

-- GOOD: use clustering column
CREATE TABLE order_items (
    order_id UUID,
    item_id UUID,
    ...,
    PRIMARY KEY ((order_id), item_id)
);

Concurrent list index updates

-- BAD: race condition
UPDATE users SET history[0] = 'new' WHERE user_id = ?;

-- GOOD: append-only
UPDATE users SET history = history + ['new'] WHERE user_id = ?;

Using collections for querying

-- BAD: scanning all rows
SELECT * FROM products WHERE tags CONTAINS 'sale' ALLOW FILTERING;

-- GOOD: denormalize to separate table
CREATE TABLE products_by_tag (
    tag TEXT,
    product_id UUID,
    PRIMARY KEY ((tag), product_id)
);

Best Practices¶

Keep collections small: Target < 100 elements
Use frozen for immutable data: Better performance when not updating elements
Prefer SET over LIST: Unless order or duplicates matter
Avoid index updates on LIST: Use append/prepend only
Consider denormalization: Large collections → separate table
Index strategically: CONTAINS queries need secondary index
Monitor tombstones: Element deletes create tombstones

Tuple Type¶

Fixed-length ordered collection of typed elements.

CREATE TABLE locations (
    location_id UUID PRIMARY KEY,
    coordinates TUPLE<DOUBLE, DOUBLE>,    -- (latitude, longitude)
    address TUPLE<TEXT, TEXT, TEXT, TEXT>  -- (street, city, state, zip)
);

INSERT INTO locations (location_id, coordinates, address)
VALUES (uuid(),
    (37.7749, -122.4194),
    ('123 Main St', 'San Francisco', 'CA', '94102'));

-- Access tuple elements (0-indexed)
SELECT coordinates, address FROM locations;

User-Defined Types (UDT)¶

Custom composite types.

-- Create UDT
CREATE TYPE address (
    street TEXT,
    city TEXT,
    state TEXT,
    zip_code TEXT,
    country TEXT
);

CREATE TYPE phone (
    country_code TEXT,
    number TEXT,
    type TEXT
);

-- Use UDT in table
CREATE TABLE customers (
    customer_id UUID PRIMARY KEY,
    name TEXT,
    billing_address FROZEN<address>,
    shipping_address FROZEN<address>,
    phones LIST<FROZEN<phone>>
);

-- Insert with UDT
INSERT INTO customers (customer_id, name, billing_address, phones)
VALUES (uuid(), 'John Doe',
    {street: '123 Main St', city: 'NYC', state: 'NY', zip_code: '10001', country: 'USA'},
    [{country_code: '+1', number: '555-0100', type: 'mobile'}]);

-- Query UDT fields
SELECT name, billing_address.city, billing_address.state FROM customers;

Frozen Type¶

Makes collections and UDTs immutable (required for use in primary keys).

-- Frozen collection in primary key
CREATE TABLE user_groups (
    group_members FROZEN<SET<UUID>>,
    group_name TEXT,
    PRIMARY KEY (group_members)
);

-- Frozen UDT
CREATE TABLE orders (
    order_id UUID PRIMARY KEY,
    shipping_address FROZEN<address>  -- Entire UDT is replaced on update
);

Frozen vs Non-Frozen: - Frozen: Entire value serialized together, atomic updates only - Non-Frozen: Individual elements can be updated

Vector Type (Cassandra 5.0+)¶

Fixed-size array of floats for ML embeddings.

CREATE TABLE embeddings (
    id UUID PRIMARY KEY,
    content TEXT,
    embedding VECTOR<FLOAT, 384>  -- 384-dimensional vector
);

-- Insert vector
INSERT INTO embeddings (id, content, embedding)
VALUES (uuid(), 'Sample text', [0.1, 0.2, 0.3, ...]);  -- 384 values

-- Create SAI index for vector search
CREATE INDEX ON embeddings (embedding) USING 'sai';

-- Approximate nearest neighbor search
SELECT * FROM embeddings
ORDER BY embedding ANN OF [0.1, 0.2, 0.3, ...]
LIMIT 10;

Type Conversion Functions¶

Function	Description
`toTimestamp(timeuuid)`	TIMEUUID → TIMESTAMP
`toDate(timestamp)`	TIMESTAMP → DATE
`toUnixTimestamp(timeuuid)`	TIMEUUID → BIGINT
`dateOf(timeuuid)`	TIMEUUID → TIMESTAMP
`textAsBlob(text)`	TEXT → BLOB
`blobAsText(blob)`	BLOB → TEXT
`typeAsBlob(value)`	Any type → BLOB

Next Steps¶

DDL Commands - Schema management