BigQuery’s nested data structures are so powerful because they allow you to model complex, hierarchical data directly within a relational database, blurring the lines between NoSQL and SQL.

Let’s see this in action. Imagine you’re tracking user activity on a website. Each user might perform multiple actions, and each action could have associated details. A traditional relational approach might involve separate tables for users and actions, linked by foreign keys. With BigQuery’s nested structures, we can represent this more elegantly.

Here’s a BigQuery schema definition for such a scenario:

[
  {
    "name": "user_id",
    "type": "STRING",
    "mode": "REQUIRED"
  },
  {
    "name": "user_info",
    "type": "RECORD",
    "mode": "NULLABLE",
    "fields": [
      {"name": "name", "type": "STRING"},
      {"name": "email", "type": "STRING"}
    ]
  },
  {
    "name": "events",
    "type": "RECORD",
    "mode": "REPEATED",
    "fields": [
      {"name": "timestamp", "type": "TIMESTAMP"},
      {"name": "event_type", "type": "STRING"},
      {"name": "details", "type": "RECORD",
        "fields": [
          {"name": "page_url", "type": "STRING"},
          {"name": "button_clicked", "type": "STRING"}
        ]
      }
    ]
  }
]

In this schema:

  • user_id is a simple string.
  • user_info is a RECORD (struct) containing name and email.
  • events is a REPEATED RECORD (an array of structs). Each element in this array represents a single event performed by the user.
  • Within each event, there’s another nested RECORD called details, which can hold specific information about that event, like the page_url or button_clicked.

Now, let’s populate this with some sample data and query it.

Sample Data Insertion (using bq load or INSERT statements):

INSERT INTO your_dataset.user_activity (user_id, user_info, events) VALUES
(
  'user123',
  STRUCT('Alice Smith' AS name, 'alice@example.com' AS email),
  [
    STRUCT(TIMESTAMP '2023-10-27 10:00:00 UTC' AS timestamp, 'page_view' AS event_type, STRUCT('homepage' AS page_url, NULL AS button_clicked) AS details),
    STRUCT(TIMESTAMP '2023-10-27 10:05:15 UTC' AS timestamp, 'click' AS event_type, STRUCT('homepage' AS page_url, 'signup_button' AS button_clicked) AS details)
  ]
),
(
  'user456',
  STRUCT('Bob Johnson' AS name, 'bob@example.com' AS email),
  [
    STRUCT(TIMESTAMP '2023-10-27 11:15:30 UTC' AS timestamp, 'page_view' AS event_type, STRUCT('/products' AS page_url, NULL AS button_clicked) AS details)
  ]
);

Querying Nested Data:

To access fields within a RECORD, you use dot notation. For REPEATED fields, you use UNNEST to flatten the array into rows.

Let’s find all 'click' events and the user who performed them:

SELECT
  ua.user_id,
  ua.user_info.name,
  event.timestamp,
  event.event_type,
  event.details.button_clicked
FROM
  your_dataset.user_activity AS ua,
  UNNEST(ua.events) AS event
WHERE
  event.event_type = 'click'
  AND event.details.button_clicked IS NOT NULL;

This query first selects from user_activity aliased as ua. Then, it UNNESTs the events array, making each event an individual row associated with its parent user_id. We can then access ua.user_info.name and event.details.button_clicked directly.

The result would look something like this:

user_id name timestamp event_type button_clicked
user123 Alice Smith 2023-10-27 10:05:15.000000+00 click signup_button

You can also query specific nested fields within the user_info struct without unnesting:

SELECT
  user_info.name,
  user_info.email
FROM
  your_dataset.user_activity
WHERE
  user_id = 'user123';

This would return:

name email
Alice Smith alice@example.com

The real power comes when you start aggregating data across these nested structures. For instance, to count how many events each user performed:

SELECT
  ua.user_id,
  ARRAY_LENGTH(ua.events) AS event_count
FROM
  your_dataset.user_activity AS ua;

Or, to count the number of page views per user:

SELECT
  ua.user_id,
  COUNT(event.event_type) AS page_view_count
FROM
  your_dataset.user_activity AS ua,
  UNNEST(ua.events) AS event
WHERE
  event.event_type = 'page_view'
GROUP BY
  ua.user_id;

The mental model to keep in mind is that RECORDs (structs) are like objects or dictionaries, and REPEATED RECORDs are like lists or arrays of those objects. BigQuery handles the storage and retrieval efficiently, but when you need to operate on individual elements of an array, UNNEST is your primary tool for bringing them into the relational context of rows.

When you construct a STRUCT or an array of STRUCTs in your INSERT statements or SELECT queries, you are explicitly defining these nested relationships. The STRUCT(value AS field_name, ...) syntax is key for creating the inner objects, and [STRUCT(...), STRUCT(...)] is for creating the arrays of these objects.

A common pitfall is forgetting to UNNEST when you want to filter or aggregate based on data within a repeated field. Treating a REPEATED field as a single value will lead to unexpected results or errors, as BigQuery expects you to explicitly expand the array into individual records for processing.

The next step in mastering nested data is often handling complex filtering and joins involving these structures, or exploring functions like ARRAY_AGG to re-aggregate data back into nested formats.

Want structured learning?

Take the full Bigquery course →