Easily build complex reports
Monitoring and efficiency metrics
Custom cost allocation tags
Network cost visibility
Organizational cost hierarchies
Budgeting and budget alerts
Discover active resources
Consumption-based insights
Alerts for unexpected charges
Automated AWS cost savings
Discover cost savings
Unified view of AWS discounts
COGS and business metrics
Model savings plans
Collaborate on cost initiatives
Create and manage your teams
Automate cloud infrastructure
Cloud cost issue tracking
Detect cost spikes
by Vantage Team
Contents
Snowflake is famous for inventing the cloud data warehouse, a scalable “Data Cloud” with usage-based pricing. Users only pay when queries are running or data is moving. A classic example of the benefits of pay-as-you-go pricing comes from the company itself:
We have one customer that, every Friday afternoon, spins up a massive number of virtual warehouses to do their processing as a billing company. They spin up thousands of nodes, it runs for several hours, and then they spin them back down again.
The trade-off with usage-based pricing is that costs vary, even if they are lower. In Snowflake’s case, rogue queries can rack up bills, and any query is a target for optimization to lower costs.
By pulling together multiple tables and information about Snowflake’s pricing model, we can calculate cost per query. For advanced users, we can also employ query tags to get the most granular cost tracking available for the Data Cloud.
Some math is needed to find out how much each query costs in Snowflake. As described in the Vantage “Costs per Query” documentation:
Snowflake Compute is billed only when a query is running, rounded up to the hour; however, if multiple queries are running simultaneously, the cost is still only reported for that hour.
Therefore, we have to combine query execution time, warehouse size, and some other details to get the cost of each query. Snowflake offers an example query that ties together these metrics:
SET credit_price = 3.00; SELECT ( QH.execution_time / (1000 * 3600) ) * WS.credits * ( $credit_price ) AS relative_performance_cost FROM query_history QH JOIN warehouse_size WS ON WS.wh_size = QH.warehouse_size
This query pulls from the QUERY_HISTORY table, part of the ACCOUNT_USAGE schema in the warehouse. The WAREHOUSE_SIZE table is a generated table to equate Snowflake credits to warehouse size. Compute is billed per credit (e.g., $3 per credit), and so the cost is based on the number of credits that are used for a certain warehouse size:
QUERY_HISTORY
ACCOUNT_USAGE
WAREHOUSE_SIZE
SELECT wh_size, credits FROM ( SELECT 'XSMALL' AS wh_size, 1 AS credits UNION ALL SELECT 'SMALL' AS wh_size, 2 AS credits UNION ALL SELECT 'MEDIUM' AS wh_size, 4 AS credits UNION ALL SELECT 'LARGE' AS wh_size, 8 AS credits UNION ALL SELECT 'XLARGE' AS wh_size, 16 AS credits UNION ALL SELECT '2XLARGE' AS wh_size, 32 AS credits UNION ALL SELECT '3XLARGE' AS wh_size, 64 AS credits UNION ALL SELECT '4XLARGE' AS wh_size, 128 AS credits )
These credits are mapped to warehouse sizes per hour. For example, the XLARGE warehouse consumes 16 credits for a full hour of usage. By combining the EXECUTION_TIME of each query with the warehouse size it ran on, we can get the cost in dollars of each query.
XLARGE
EXECUTION_TIME
There are some other details needed to rank Snowflake queries by cost. Many queries will use the same structure but different values in the WHERE clause or a different order of columns.
WHERE
When a warehouse first spins up, you are charged for one minute’s worth of credits—even if the query only runs for 30 seconds. Opendoor engineering accounted for this in the query they built to track costs, as does the console. Additionally, Snowflake employs a global query cache, which can have a dramatic impact on costs. These features are covered in this deep dive on Snowflake Compute costs.
The method above gets cost per query, per warehouse, per user, and so forth, but often, we are interested in tracking per-unit costs in Snowflake. Per-unit costs are the costs for an entire unit of the business—like all the queries run by the production backend service, or all queries run by a data science team, or all the queries run for a consulting client.
Query tags are a nifty feature for tracking per-unit costs in Snowflake. Just like in AWS, tags are useful for mapping business costs to cloud spend. Snowflake says:
Tags enable data stewards to monitor sensitive data for compliance, discovery, protection, and resource usage use cases through either a centralized or decentralized data governance management approach.
Resource usage is our use case for query tags here. In any query session—for example, writing queries in the Snowflake console or using the Python connector—it is possible to set a QUERY_TAG for the session.
QUERY_TAG
ALTER SESSION SET QUERY_TAG = 'Monthly Close Books Process' ;
Then, all queries subsequently run in that session will be tagged with Monthly Close Books Process. Tags can also be set by user and by account.
ALTER USER vantage SET QUERY_TAG = 'Cloud Costs' ;
Setting the query tag at the session level is the most granular tagging possible and overrides any tags set at the user or account level.
Grouping by Query Tag in the console.
Query tags appear in the same QUERY_HISTORY table we used for cost per query, so tags can be used as another dimension to group costs by, along with organization, user, and warehouse.
The advent of a near-infinite scaling, cloud data warehouse has given organizations more control over how they consume their data. With Snowflake, using the platform maps more tightly to a company’s natural business processes than an on-premises data warehouse. With the right tables, engineering teams can drill down to an individual query to track the cost of using data for the business.
For more granular tracking, Snowflake has query tags, which are supported in the Snowflake integration with Vantage. Whichever method and tools you choose, tracking cost per query in Snowflake is a data engineering efficiency practice that is also great for the bottom line.
MongoDB Atlas is the cost-effective choice for production workloads where high-availability is a requirement.
Grafana is a strong competitor to the monitoring and observability features of Datadog for a fraction of the price.
AWS is implementing a policy update that will no longer allow Reserved Instances and Savings Plans to be shared across end customers.