The INFORMATION_SCHEMA in BigQuery is actually a set of SQL views that live in a special dataset within each BigQuery project.
Let’s see it in action. Imagine you have a dataset named my_project.my_dataset and within it, a table called customers. You can query INFORMATION_SCHEMA to get details about this table:
SELECT
column_name,
data_type,
is_nullable
FROM
my_project.my_dataset.INFORMATION_SCHEMA.COLUMNS
WHERE
table_name = 'customers';
This query will return a list of all columns in the customers table, their data types, and whether they can contain NULL values.
The core problem INFORMATION_SCHEMA solves is providing a programmatic way to understand your BigQuery data landscape. Before INFORMATION_SCHEMA, you’d often rely on the BigQuery UI or external tools to get table schemas, row counts, or dataset details. This made automation and complex data governance difficult. INFORMATION_SCHEMA exposes this metadata directly as queryable tables, allowing you to:
- Discover and document: Automatically generate a catalog of your tables, views, and their schemas.
- Monitor usage and costs: Track table sizes, creation times, and last modified dates to manage storage costs.
- Enforce governance: Write queries to identify tables with sensitive data, check for missing partitions, or verify data quality.
- Automate data pipelines: Use metadata to dynamically build ETL jobs or data validation scripts.
Internally, INFORMATION_SCHEMA is a collection of SQL views. When you query a view like INFORMATION_SCHEMA.TABLES or INFORMATION_SCHEMA.COLUMNS, BigQuery doesn’t query a separate physical table. Instead, it executes a highly optimized internal operation to retrieve the requested metadata from its own catalog of datasets, tables, columns, and other objects. The WHERE clauses you use filter this metadata just like any other SQL query.
Think of it as a live, queryable API for your BigQuery project’s structure. Each INFORMATION_SCHEMA view corresponds to a specific type of metadata:
INFORMATION_SCHEMA.SCHEMATA: Lists all datasets (schemas) in your project.INFORMATION_SCHEMA.TABLES: Lists all tables and views within a dataset. You can filter bytable_typeto see onlyBASE TABLEorVIEW.INFORMATION_SCHEMA.COLUMNS: Details about columns in tables and views.INFORMATION_SCHEMA.PARTITIONS: Information about table partitions.INFORMATION_SCHEMA.ROUTINES: Details about stored procedures and user-defined functions (UDFs).INFORMATION_SCHEMA.JOBS: Information about recently executed jobs (though this has a limited history and is better suited for theINFORMATION_SCHEMA.JOBS_BY_*views for longer retention).
You can also query INFORMATION_SCHEMA for specific regions if your data is regional. For example, to see tables in the us-east1 region:
SELECT
table_catalog,
table_schema,
table_name,
row_count
FROM
`region-us-east1`.INFORMATION_SCHEMA.TABLES
WHERE
table_schema = 'my_dataset';
Here, table_catalog refers to the project ID. The region-us-east1 prefix tells BigQuery to look in the INFORMATION_SCHEMA specific to that region.
The most surprising thing is how granularly you can control which INFORMATION_SCHEMA you’re querying. Most people know about querying INFORMATION_SCHEMA within their current project or dataset. However, you can also query INFORMATION_SCHEMA for other projects and even other regions within those projects, provided you have the necessary permissions. This allows for cross-project metadata discovery and governance without needing to aggregate metadata into a central location. For instance, to see tables in a dataset called finance_data in project another-company-project located in europe-west2:
SELECT
table_catalog,
table_schema,
table_name
FROM
`another-company-project`.`europe-west2`.INFORMATION_SCHEMA.TABLES
WHERE
table_schema = 'finance_data';
This capability is incredibly powerful for managing large, distributed data environments.
After mastering INFORMATION_SCHEMA for metadata discovery, you’ll likely want to explore how to leverage it for more advanced data quality checks and automated data governance policies.