BigQuery’s cross-project querying is surprisingly flexible, allowing you to access data in one project directly from another without moving it, but it’s often misunderstood as a simple SELECT * FROM project.dataset.table when the nuances of IAM and network configuration are what truly unlock its power.
Let’s see it in action. Imagine project-A has a dataset named sales_data with a table daily_reports. You’re working in project-B and want to query this data.
-- In project-B's BigQuery UI or client tool
SELECT
date,
region,
SUM(revenue) AS total_revenue
FROM
`project-A.sales_data.daily_reports`
WHERE
date >= '2023-01-01'
GROUP BY
date,
region
ORDER BY
date DESC;
This query, executed from project-B, reads data from project-A without any ETL. The magic here is BigQuery’s ability to reference tables using their fully qualified names, including the project ID.
The core problem this solves is data siloing. Organizations often have distinct projects for different teams, applications, or environments (dev, staging, prod). Cross-project querying breaks down these silos, enabling unified analytics and reporting without the overhead of data duplication or complex data pipelines. It’s about treating data as a shared asset, governed by access controls.
Internally, when you query project-A.sales_data.daily_reports from project-B, BigQuery acts as an intermediary. It receives the query from project-B’s context. To access the data in project-A, it performs an authorized request on behalf of the user or service account running the query in project-B. This requires specific IAM permissions in project-A to be granted to the identity running the query in project-B. The data itself remains resident in project-A, and only the query results are streamed back to project-B for processing or display.
The primary levers you control are:
- IAM Permissions: This is the gatekeeper. The identity (user or service account) executing the query in
project-Bmust have read access to the table inproject-A. - Project/Dataset/Table Naming: Using the fully qualified name
project-id.dataset-id.table-idis essential. - Network Configuration: While less common for basic cross-project queries, if your
project-Benvironment has restrictive VPC Service Controls or network firewalls, these can impede BigQuery’s ability to access resources inproject-A.
Let’s dive into IAM, the most critical aspect. For the query above to succeed, the user or service account associated with project-B needs permissions on project-A. Specifically, within project-A, the identity from project-B needs at least the BigQuery Data Viewer role (roles/bigquery.dataViewer) granted on the specific dataset (sales_data) or the table (daily_reports), or even at the project level.
To grant this, you’d navigate to project-A’s IAM page in the GCP console. You’d add a principal (the email address of the user or service account from project-B) and assign them the BigQuery Data Viewer role.
For example, if your service account in project-B is my-sa@project-b.iam.gserviceaccount.com, you would go to project-A’s IAM, click "Grant Access", enter my-sa@project-b.iam.gserviceaccount.com as the new principals, and select "BigQuery Data Viewer" as the role. This allows my-sa to read data from any dataset in project-A it has access to. For finer-grained control, you can grant this role on a specific dataset within project-A.
The common pitfall is assuming that because project-B has BigQuery access, it can automatically see data in other projects. BigQuery is project-scoped by default. Permissions are explicit.
Beyond IAM, consider VPC Service Controls. If project-A is protected by a perimeter, you’ll need to explicitly allow access from project-B’s perimeter or create an access policy that permits cross-project queries. This is typically configured by defining an "Access Level" that includes project-B’s identity and then allowing that access level in project-A’s service perimeter.
The actual data transfer during a cross-project query is handled by BigQuery’s internal network. You don’t need to configure explicit peering or VPNs between projects for BigQuery to work, unless you’re using VPC Service Controls and need to explicitly bridge perimeters. BigQuery’s infrastructure handles the communication.
One subtle point often overlooked is how BigQuery resolves the project in the fully qualified name. When you run a query from project-B, and the query references project-A.dataset.table, BigQuery first resolves project-A to its project number. Then, it checks if the identity running the query (e.g., your user account or a service account) has been granted the necessary permissions within project-A. It’s not about project-B having permissions on project-A; it’s about the identity used in project-B having permissions in project-A. This distinction is crucial when troubleshooting.
The next hurdle you’ll likely encounter is managing costs. While data isn’t duplicated, queries still incur costs based on the amount of data scanned in the source project (project-A). You’ll want to ensure your queries are optimized to scan only necessary data, using filters and partitioning effectively, and be mindful of the billing account associated with the project where the query is executed (project-B) for the query processing fee, while the data scanned cost is attributed to the source project (project-A).