BigQuery doesn’t actually bill you for the bytes scanned from your tables; it bills you for the bytes processed by your query.

Let’s see this in action. Imagine a table my_project.my_dataset.my_table with a single column data of type STRING.

-- This query scans the whole table, but only processes a tiny amount of data
SELECT COUNT(*)
FROM `my_project.my_dataset.my_table`;

This query, even if my_table is terabytes in size, will likely cost next to nothing because BigQuery can often optimize this to read only metadata or a small portion of the data to determine the row count.

Now, consider this:

-- This query processes the entire content of the 'data' column for every row
SELECT SUM(LENGTH(data))
FROM `my_project.my_dataset.my_table`;

This query, on the other hand, will scan and process the entire content of the data column for every single row in the table. If data contains large strings, this query could be extremely expensive, even if you only need a simple aggregation. The difference lies in what BigQuery has to do with the data, not just how much of the table it looks at.

This distinction is crucial because many common BigQuery operations, like SELECT *, COUNT(*), or filtering on columns that aren’t partitioned or clustered, can inadvertently lead to massive data processing. The core problem BigQuery solves is making massive datasets queryable with minimal infrastructure management. The cost model, therefore, is designed to reflect the computational effort.

The levers you control are primarily:

  1. Column Selection: Only SELECT the columns you absolutely need. Avoid SELECT *.
  2. Data Filtering: Use WHERE clauses effectively, especially on partition and cluster columns.
  3. Data Types: Use the most efficient data types. INT64 is cheaper to process than STRING.
  4. Data Partitioning and Clustering: Structure your tables to align with query patterns.
  5. Query Optimization: Write queries that minimize the amount of data read and processed.

The one thing most people don’t realize is that BigQuery’s query planner is incredibly aggressive about pushing down filters and projections. If you have a WHERE clause on a column, BigQuery will try its absolute hardest not to read data from other columns in rows that don’t match the filter. This is why SELECT column_a FROM my_table WHERE column_b = 'value' can be much cheaper than SELECT * FROM my_table WHERE column_b = 'value', even if column_a and column_b are in the same row. It’s not just about reading less data, but reading less data per column for the rows that are relevant.

The next concept you’ll run into is understanding how BigQuery’s slot allocation and concurrency affect query performance and cost, especially when using BigQuery BI Engine.

Want structured learning?

Take the full Bigquery course →