Cassandra’s time-series modeling doesn’t inherently prevent hot partitions; you have to engineer it to.
Let’s see how this works with some data. Imagine we’re tracking sensor readings from IoT devices.
{
"device_id": "sensor-123",
"timestamp": "2023-10-27T10:00:00Z",
"reading": 25.5,
"unit": "celsius"
}
{
"device_id": "sensor-123",
"timestamp": "2023-10-27T10:01:00Z",
"reading": 25.7,
"unit": "celsius"
}
{
"device_id": "sensor-456",
"timestamp": "2023-10-27T10:00:00Z",
"reading": 70.1,
"unit": "fahrenheit"
}
A common, naive approach is to use device_id as the partition key and timestamp as the clustering key.
CREATE TABLE sensor_readings (
device_id text,
timestamp timestamp,
reading float,
unit text,
PRIMARY KEY (device_id, timestamp)
) WITH CLUSTERING ORDER BY (timestamp DESC);
If you have one very chatty device, say device-id-super-active, all its data will land on a single partition. Writes and reads for that device will hammer one node, creating a hot spot. This is the problem we’re solving.
The core idea is to distribute writes and reads across multiple partitions even for the same device. We achieve this by introducing a "time bucket" or "shard" into the partition key. This bucket divides time into manageable chunks, ensuring that data for a single device is spread across multiple partitions based on the time bucket.
Here’s the improved schema:
CREATE TABLE sensor_readings_sharded (
device_id text,
time_bucket text, -- e.g., "2023-10-27-10" for hourly buckets
timestamp timestamp,
reading float,
unit text,
PRIMARY KEY (device_id, time_bucket, timestamp)
) WITH CLUSTERING ORDER BY (time_bucket ASC, timestamp DESC);
Now, the partition key is (device_id, time_bucket). If we choose hourly buckets, all data for device-id-super-active within a given hour will go to one partition, but data from different hours will go to different partitions. This spreads the load.
Let’s say our time_bucket is derived from the timestamp. For an hourly bucket, we can format the timestamp like YYYY-MM-DD-HH.
from datetime import datetime
def get_time_bucket(timestamp_str, bucket_size_hours=1):
dt_obj = datetime.fromisoformat(timestamp_str.replace('Z', '+00:00'))
if bucket_size_hours == 1:
return dt_obj.strftime("%Y-%m-%d-%H")
elif bucket_size_hours == 24:
return dt_obj.strftime("%Y-%m-%d")
# Add more granularities as needed
return dt_obj.strftime("%Y-%m-%d-%H") # Default to hourly
# Example usage:
timestamp_str = "2023-10-27T10:15:30Z"
print(f"Hourly bucket: {get_time_bucket(timestamp_str, bucket_size_hours=1)}")
print(f"Daily bucket: {get_time_bucket(timestamp_str, bucket_size_hours=24)}")
When writing data, you’d calculate the time_bucket and insert it:
-- Assuming device_id='sensor-123', timestamp='2023-10-27T10:15:30Z', reading=26.0
INSERT INTO sensor_readings_sharded (device_id, time_bucket, timestamp, reading, unit)
VALUES ('sensor-123', '2023-10-27-10', '2023-10-27T10:15:30Z', 26.0, 'celsius');
For reads, you typically query within a specific time range. Because of the sharding, you might need to query across multiple time_bucket partitions. This is where ALLOW FILTERING can be a performance killer, but we can avoid it by being explicit.
If you want readings for device-id-super-active for the last 24 hours:
SELECT * FROM sensor_readings_sharded
WHERE device_id = 'sensor-123'
AND time_bucket IN ('2023-10-27-09', '2023-10-27-10', '2023-10-27-11'); -- assuming current time is 10:XX
The device_id and time_bucket together form the partition key. Cassandra will route these queries to the nodes responsible for those specific partitions. By choosing an appropriate time_bucket size (e.g., hourly, daily), you ensure that writes for a single device are spread across multiple nodes over time, preventing any single node from becoming overloaded. The timestamp acts as the clustering key, allowing efficient sorting and range scans within a specific (device_id, time_bucket) partition.
The surprising part is that the same device can have its data spread across many partitions, and this is not only acceptable but the desired outcome for performance. The time_bucket is not part of the logical "identity" of a single reading; it’s purely a physical partitioning strategy.
The next concept to consider is how to handle the lifecycle of these time buckets. You’ll eventually want to drop old data, and doing so efficiently requires understanding how to drop entire partitions.