BigQuery’s wildcard table feature lets you query multiple tables with a single SQL statement, but it’s not just a convenience; it’s a fundamental mechanism for handling time-series or sharded data efficiently.

Imagine you have a massive dataset of user activity, logged daily into separate tables named user_activity_20230101, user_activity_20230102, and so on. Instead of writing a UNION ALL query for each day, you can use a wildcard.

SELECT
  COUNT(*)
FROM
  `your_project.your_dataset.user_activity_*`
WHERE
  _TABLE_SUFFIX BETWEEN '20230101' AND '20230105'

This query will scan only the tables within your_dataset that start with user_activity_ and whose suffix falls between 20230101 and 20230105. The _TABLE_SUFFIX pseudo-column is key here. It’s automatically populated by BigQuery with the part of the table name that matched the wildcard.

The primary problem this solves is managing and querying data that’s naturally partitioned by date or some other identifier. Without wildcards, you’d face:

  • Manual Table Management: Constantly updating queries as new data arrives.
  • Performance Issues: UNION ALL queries can become unwieldy and harder for BigQuery to optimize.
  • Increased Complexity: Maintaining a long list of tables in a single query is error-prone.

The internal mechanism is surprisingly simple yet powerful. When you use a wildcard like *, BigQuery doesn’t immediately expand it into a list of all matching tables. Instead, it uses the _TABLE_SUFFIX (or _PARTITIONTIME for ingestion-time partitioned tables) as a filter before it even considers scanning data. This means that if you have millions of tables but only want to query a few, BigQuery will only list and process those specific few, avoiding a massive overhead.

Let’s look at a more practical example, simulating a common scenario for event data.

Scenario: You’re tracking website events (page views, clicks) and log them daily.

Table Naming Convention: website_events_YYYYMMDD

Goal: Find the total number of page views in January 2023.

Configuration:

  • Project ID: my-gcp-project
  • Dataset ID: analytics
  • Table Prefix: website_events_

Query:

SELECT
  COUNT(*) AS total_page_views
FROM
  `my-gcp-project.analytics.website_events_*`
WHERE
  _TABLE_SUFFIX BETWEEN '20230101' AND '20230131'
  AND event_type = 'page_view'

This query is incredibly efficient. BigQuery first identifies all tables in my-gcp-project.analytics that match the website_events_* pattern. Then, it applies the _TABLE_SUFFIX filter to narrow down the list to only those tables whose suffix is within the January 2023 range. Finally, it scans only those selected tables and filters for event_type = 'page_view'. The cost and performance are directly proportional to the amount of data in the selected tables, not the total number of tables matching the prefix.

The real power comes in understanding what _TABLE_SUFFIX can represent. It’s not just fixed-width dates. If your tables are named events_2023_01_01, events_2023_01_02, the wildcard events_2023_01_* would work, and _TABLE_SUFFIX would contain 01, 02, etc. However, filtering on these suffixes requires them to be compatible with string comparisons. For date-based filtering, it’s best to stick to the YYYYMMDD or YYYYMM format for _TABLE_SUFFIX to leverage range filters effectively.

A common misconception is that using a wildcard automatically means you’re scanning all tables. This is only true if you don’t provide a filter on _TABLE_SUFFIX or if your filter is so broad it includes everything. BigQuery is smart enough to prune tables based on the _TABLE_SUFFIX condition before it starts processing. This is why using a WHERE _TABLE_SUFFIX BETWEEN '...' AND '...' clause is crucial for performance and cost savings.

When you use a wildcard in a FROM clause, BigQuery creates a "flattened" view of the data across the selected tables. It doesn’t actually merge the tables into one giant temporary table. Instead, it manages the read operations across the relevant underlying tables. This is why you can query petabytes of data spread across thousands of tables as if they were one, without the physical overhead of merging.

The next step in managing large datasets efficiently is understanding how to combine wildcard tables with partitioning and clustering for even finer-grained control and performance.

Want structured learning?

Take the full Bigquery course →