Cassandra’s user-defined types (UDTs) let you embed structured data within your tables, behaving much like a struct or an object in other programming languages.
Let’s see how this works with a concrete example. Imagine we’re building a system to track user profiles, and each user has an address. Instead of having separate columns for street, city, state, and zip_code, we can group these into a single UDT.
First, we define the UDT:
CREATE TYPE IF NOT EXISTS address (
street text,
city text,
state text,
zip_code text
);
This address UDT is now a reusable data structure. We can then use it as the type for a column in a table:
CREATE TABLE IF NOT EXISTS users (
user_id uuid PRIMARY KEY,
name text,
email text,
billing_address address,
shipping_address address
);
Notice that we can even use the same UDT multiple times within a single table for different columns.
Now, let’s insert some data. When inserting, you construct the UDT using the type keyword followed by curly braces containing the field names and their values:
INSERT INTO users (user_id, name, email, billing_address, shipping_address) VALUES (
uuid(),
'Alice Wonderland',
'alice@example.com',
type('address', {street: '123 Main St', city: 'Fictionalville', state: 'CA', zip_code: '90210'}),
type('address', {street: '456 Oak Ave', city: 'Anothercity', state: 'NY', zip_code: '10001'})
);
When you query this data, the UDT will be returned as a map-like structure. You can access individual fields within the UDT using dot notation:
SELECT user_id, name, billing_address.city, shipping_address.street FROM users WHERE user_id = <some_uuid>;
This query would return something like:
user_id | name | billing_address.city | shipping_address.street
----------------------------------------+-----------------+----------------------+-------------------------
f47ac10b-58cc-4372-a567-0e02b9c39572 | Alice Wonderland | Fictionalville | 456 Oak Ave
The primary benefit here is data organization and consistency. By defining an address UDT, you ensure that every address in your system has the same structure. This reduces errors from typos in column names (like city vs. cty) and makes your schema more readable and maintainable. It also simplifies application code; instead of managing multiple fields for an address, you manage a single address object.
Under the hood, Cassandra stores UDTs as a set of individual fields, but it manages the mapping between the UDT definition and these fields. When you update a UDT field, Cassandra only writes the changed field, not the entire UDT structure, which is efficient. However, if you change the UDT definition itself (e.g., add a new field), you need to be careful about backward compatibility with existing data. Cassandra handles adding new fields gracefully, but removing or renaming fields can cause issues with older data.
A common pitfall is attempting to update a UDT field by providing a full UDT value when you only intend to change a single field. For instance, if you wanted to update only the street of the billing_address, you cannot do this:
-- This will FAIL if billing_address already has data
UPDATE users SET billing_address = type('address', {street: '789 Pine Ln'}) WHERE user_id = <some_uuid>;
Instead, you must fetch the existing UDT, modify the specific field in your application, and then INSERT or UPDATE the entire UDT with the modified value. This is because type('address', {street: '789 Pine Ln'}) creates a new address UDT, and if you try to assign it to a column that already holds a UDT, Cassandra expects all fields of the UDT to be present. A more correct way to update a single field would be:
-- Assuming you fetched the existing billing_address and modified its street
UPDATE users SET billing_address = type('address', {street: '789 Pine Ln', city: 'Fictionalville', state: 'CA', zip_code: '90210'}) WHERE user_id = <some_uuid>;
This behavior is often surprising because it deviates from how some object-oriented languages might handle partial updates.
The next step in managing complex data in Cassandra often involves using collections like maps and sets alongside or instead of UDTs, depending on whether the structure is fixed or dynamic.