BigQuery ranks at number three in top Google Cloud spending. Given it is a high contributor to cloud costs, users are looking for ways to cut costs by optimizing their expensive queries. However, pinpointing which specific queries are driving the highest costs can be a challenge. This is where labels come into play.
By leveraging labels on query jobs you can filter to see the spending of each query job, thereby gaining insight into which queries may need to be optimized.
BigQuery pricing is relatively simple, primarily consisting of compute pricing (the cost to process queries) and storage pricing (the cost to store data). It gets a little more complicated if you consider additional features or the free tier. For our purposes, we’re just going to focus on on-demand compute pricing.
|$6.25 per TiB||1 free TiB per month|
For companies running many queries, these costs can add up and users look to understand ways to cut costs on costly queries. The challenge lies in the fact that BigQuery does not provide a direct way to view the cost associated with each query. While you can see the overall cost incurred during a particular period or try to estimate it yourself, breaking down these costs on a per-query basis is not readily available through the native BigQuery interface.
Labels can be applied to query jobs and provide an excellent way to filter and gain insights into your query costs. The difference between query jobs and queries in BigQuery are as follows:
- Query: The SQL statement or request that you submit to the system to retrieve, manipulate, or analyze data.
- Query Job: The complete process of executing a query, which includes the submission of the query, its processing by BigQuery’s infrastructure, and the retrieval of the results.
So by adding a label to a query job you can track the total cost associated with queries in that query job label. However, be mindful of some limitations when determining if it is the best solution.
- Non-Retroactive: You cannot apply the labels retroactively as they can’t be added or updated for pending, running, or completed jobs.
- Manual Labeling: While adding labels is straightforward, it can be a tedious process for large amounts of query jobs because you will need to add labels to query jobs individually.
Labels are formatted
KEY:VALUE, where keys must be unique. Keys and values “can contain only lowercase letters, numeric characters, underscores, and dashes”.
Labels can be added from the bq command-line tool or API.
Before you can start adding labels, ensure that you have been granted the necessary access permission, the BigQuery User (
roles/bigquery.user) IAM role.
Add Via API
jobs.insert method and populate the
labels property in the job configuration.
Add Via bq
--label flag to your bq query` command.
Example - Add Via bq
This query retrieves NYC 311 calls from a public dataset on a given date. By adding the
--label flag, the query will be identified with the “purpose” key and “6-15-311-analysis” value.
bq query --nouse_legacy_sql --label=purpose:6-15-311-analysis \ 'SELECT COUNT(*) as `311_count` FROM `bigquery-public-data.new_york_311.311_service_requests` WHERE DATE(`created_date`) = DATE(2021, 06, 15)'
View Labels In Vantage
After applying you will be able to see the cost for queries with those filters.
Note there may be a two day delay in seeing resources appear.
Optimizing query costs in BigQuery is essential for cost control and efficient resource utilization. The ability to add labels to your query jobs provides a valuable tool for tracking and managing costs. By identifying the most expensive queries, you can proactively optimize them and cut unnecessary expenses.