BigQuery export jobs to Google Cloud Storage don’t actually write data directly to GCS.

Let’s say you want to export a table named my_project.my_dataset.my_table to a CSV file in gs://my-bucket/my-folder/my_file.csv.

Here’s how you’d kick off that export using bq command-line tool:

bq extract \
--destination_format CSV \
--field_delimiter ',' \
--print_header \
my_project.my_dataset.my_table \
gs://my-bucket/my-folder/my_file.csv

Or, if you wanted multiple files, you’d use a wildcard:

bq extract \
--destination_format CSV \
--field_delimiter ',' \
--print_header \
my_project.my_dataset.my_table \
gs://my-bucket/my-folder/my_file_*.csv

BigQuery will then spin up a cluster of worker nodes to read your table data. Each worker reads a portion of the data and writes it to a temporary location. Once all workers have finished, BigQuery aggregates these temporary files and then writes them as your final output files to the specified GCS location. This distributed processing is why exporting large tables can be very fast.

The destination_format flag is crucial. You can choose from CSV, NEWLINE_DELIMITED_JSON, AVRO, or PARQUET. Each has its own advantages: CSV is human-readable, JSON is good for nested data, AVRO is a compact binary format, and PARQUET is optimized for columnar storage and analytics.

The field_delimiter is only applicable for CSV exports. The default is a comma, but you can specify others like \t for tab-delimited.

--print_header adds a header row to your CSV or JSON export, which is usually what you want for usability.

When you specify a wildcard like my_file_*.csv, BigQuery will create multiple files, each starting with my_file_ and ending with .csv, with a numerical suffix. This is how BigQuery handles very large exports, breaking them into manageable chunks. The number of files generated depends on the size of the table and BigQuery’s internal partitioning.

The bq extract command is asynchronous. It returns immediately after initiating the job. You can monitor the job’s progress in the BigQuery UI or by using bq ls -j.

The most surprising thing about this process is that BigQuery doesn’t just dump your data into GCS; it performs a complex distributed read-and-write operation, effectively re-shuffling and re-partitioning your data on the fly before it lands in your bucket. This is why it’s so performant, but also why you can’t, for instance, export a table that’s currently being written to.

You can also perform exports using the BigQuery API or client libraries in various programming languages, which provides more programmatic control.

The next thing you’ll likely encounter is figuring out how to efficiently load that exported data into another system, or perhaps dealing with the nuances of schema evolution when exporting and re-importing data.

Want structured learning?

Take the full Bigquery course →