BigQuery scheduled queries are actually just a clever application of Cloud Scheduler and Pub/Sub, not a distinct, built-in BigQuery feature.
Let’s see this in action. Imagine you have a daily report that needs to run at 3 AM PST.
-- daily_sales_report.sql
SELECT
DATE(order_timestamp) AS order_date,
SUM(order_total) AS total_sales,
COUNT(DISTINCT order_id) AS distinct_orders
FROM
`your_project.your_dataset.orders`
WHERE
DATE(order_timestamp) = DATE_SUB(CURRENT_DATE(), INTERVAL 1 DAY)
GROUP BY
1
ORDER BY
1;
Here’s how you’d set this up in GCP:
-
Create a Pub/Sub Topic: This is the message bus that will trigger our execution.
- Go to the Google Cloud Console -> Pub/Sub -> Topics.
- Click "Create Topic".
- Name it something like
bigquery-daily-reports-topic.
-
Create a Cloud Scheduler Job: This job will publish a message to our Pub/Sub topic at the desired time.
- Go to the Google Cloud Console -> Cloud Scheduler.
- Click "Create Job".
- Name:
run-daily-sales-report - Description:
Triggers the daily sales report BigQuery query. - Frequency:
0 3 * * *(This is cron syntax for 3:00 AM every day). - Timezone:
America/Los_Angeles(for PST). - Target type:
Pub/Sub - Topic: Select
bigquery-daily-reports-topicfrom the dropdown. - Message body: This is where we tell BigQuery what to run. The format is a JSON payload.
{ "query": "SELECT\n DATE(order_timestamp) AS order_date,\n SUM(order_total) AS total_sales,\n COUNT(DISTINCT order_id) AS distinct_orders\nFROM\n `your_project.your_dataset.orders`\nWHERE\n DATE(order_timestamp) = DATE_SUB(CURRENT_DATE(), INTERVAL 1 DAY)\nGROUP BY\n 1\nORDER BY\n 1;", "destinationTable": { "projectId": "your_project", "datasetId": "your_dataset", "tableId": "daily_sales_summary" }, "writeDisposition": "WRITE_TRUNCATE" }query: Your SQL statement.destinationTable: Where to put the results.writeDisposition:WRITE_TRUNCATEwill overwrite the table each day.WRITE_APPENDwould add to it.
- Service Account: Ensure this service account has
bigquery.jobs.createandbigquery.tables.updateDatapermissions on the target project/dataset.
-
Create a Cloud Function (or other Pub/Sub subscriber): This function listens to the Pub/Sub topic and actually executes the BigQuery job.
- Go to the Google Cloud Console -> Cloud Functions.
- Click "Create Function".
- Environment:
1st gen(or2nd genif you prefer). - Function name:
bigquery-executor - Trigger type:
Cloud Pub/Sub - Cloud Pub/Sub topic: Select
bigquery-daily-reports-topic. - Runtime: Choose a language, e.g., Python 3.9.
- Entry point:
execute_bigquery(this is the name of your function in the code). - Inline editor: Paste the code.
Python Example (
main.py):import base64 import json from google.cloud import bigquery def execute_bigquery(event, context): """Background Cloud Function to be triggered by Pub/Sub. Args: event (dict): The dictionary with data specific to this type of event. The `data` field contains the PubsubMessage message.json. context (google.cloud.functions.Context): Metadata of triggering event. """ pubsub_message = base64.b64decode(event['data']).decode('utf-8') message_data = json.loads(pubsub_message) query = message_data['query'] destination_table_info = message_data.get('destinationTable') write_disposition = message_data.get('writeDisposition', 'WRITE_APPEND') # Default to append if not specified client = bigquery.Client() job_config = bigquery.QueryJobConfig() if destination_table_info: job_config.destination = client.dataset(destination_table_info['datasetId']).table(destination_table_info['tableId']) job_config.write_disposition = write_disposition print(f"Executing query: {query}") print(f"Destination: {destination_table_info}") print(f"Write Disposition: {write_disposition}") try: query_job = client.query(query, job_config=job_config) query_job.result() # Waits for the job to complete. print(f"Query finished successfully. Job ID: {query_job.job_id}") except Exception as e: print(f"An error occurred: {e}") # In a real-world scenario, you'd want more robust error handling # and potentially send notifications. raiserequirements.txt:google-cloud-bigquery- Service Account: The Cloud Function’s service account needs
pubsub.subscriberpermission on the topic andbigquery.jobs.createandbigquery.tables.updateDataon the target BigQuery project.
The mental model here is that Cloud Scheduler is the alarm clock, Pub/Sub is the messenger, and Cloud Functions is the worker who reads the message and tells BigQuery to do its job. You’re not configuring BigQuery directly; you’re orchestrating external GCP services to invoke BigQuery.
The most surprising thing is that the "destinationTable" and "writeDisposition" parameters within the Pub/Sub message body are the only way to direct the output of a scheduled query. If they aren’t present, the query runs, but its results are discarded unless you explicitly select them into a temporary table or perform some other action within the query itself.
This setup provides immense flexibility, allowing you to chain queries, run complex ETL workflows, and integrate with other GCP services. The next logical step is to explore how to pass dynamic parameters to your scheduled queries, perhaps by including date variables or other configuration in the Pub/Sub message body.