Views are virtual tables that represent the results of a query over one or more source tables. While views do not store data themselves, they present data in a readable manner that can simplify complex queries. One common use case for views is to control data security and data access. In the medical industry, for example, views along with other access control measures can limit who sees sensitive patient data, particularly in tables that multiple departments access. Views are also useful for reporting and aggregation—easily calculate monthly sales totals or other analytics on the fly.
In Snowflake, you can create a couple of view types, including:
- Non-Materialized Views. Referred to as simply “views” by Snowflake, non-materialized views are a representation of the data generated by an underlying query. These views do not store data, but they offer a dynamic way to control access, simplify queries, and provide immediate insights.
- Materialized Views: Materialized views store pre-computed results—which accelerates query response times. These views are suitable for queries that require significant processing.
Materialized views offer the key to striking the elusive query performance balance. However, data engineers and analysts need to weigh the cost implications of materialized views and determine whether the potential performance gains make up for some often-forgotten costs.
We’ve heard a few voices shed light on some of these costs, including Sam Lambert— CEO of PlanetScale—who shared his insights on these increased rates.
Snowflake charges 10x for materialized views.— Sam Lambert (@isamlambert) August 3, 2023
That's $30/hr. 😐 pic.twitter.com/gCtNLT7q6e
This exploration will weigh the benefits and trade-offs of using materialized views in Snowflake, particularly from a cost perspective. These considerations can help you determine how to best optimize your data warehouse infrastructure while maintaining the ever-important cost-performance balance.
Snowflake Views (Non-Materialized)
A regular—or non-materialized view—offers specific perspective on your data. Views have some of the following key features, as well as some limitations.
Data storage. Non-materialized views do not store data. They are queries that represent the underlying data from a source table.
Usage costs. You incur costs only when the view is queried.
Non-materialized views have a few limitations as well, including:
Source table changes. If a source table changes, those changes may not automatically extend to the view. In other words, changing a column on the source table could cause a view to become invalid.
Performance variations. Performance is dependent on the complexity of the underlying data. Every time you query the view, the underlying query logic is re-executed on the source table.
Create a Snowflake View
In this example, you manage a data warehouse for a company that makes educational video content. The warehouse contains a
user_video_activity table that is updated in real time. This table contains a row for each instance of user activity when accessing videos. Video creators would like to access quick statistics showing how often their videos are watched. They would like to see these statistics in near real time. The
user_video_activity table also contains a lot of columns that are likely irrelevant to the creator’s needs, such as
To enhance data security and streamline access, you create a view for creators that selects the name of the video and shows total watch count, grouped by course and then video. Each instance of user video activity is an instance where that particular video was watched. This view excludes any personal user information from the base table, such as
CREATE OR REPLACE VIEW video_watch_summary AS SELECT course_id, video_id, video_title, COUNT(*) AS total_watch_count FROM user_video_activity GROUP BY course_id, video_id, video_title;
The creator can then query on this view each time they want to see video statistics, providing them with a streamlined version of the data they are looking for.
SELECT * FROM video_watch_summary; +-----------+-----------+----------------------+-------------------+ | COURSE_ID | VIDEO_ID | VIDEO_TITLE | TOTAL_WATCH_COUNT | +-----------+-----------+----------------------+-------------------+ | 101 | 1001 | Introduction | 3200 | | 101 | 1002 | Installing Python | 3300 | | 101 | 1003 | Python Loops | 3506 | | 102 | 2001 | Getting Started | 1067 | | 102 | 2002 | Advanced Tips | 1102 | +-----------+-----------+----------------------+-------------------+
Snowflake Materialized View
A materialized view in Snowflake stores the underlying query results so that when the view is queried, the results are already pre-computed. When you have to consistently run a query, these time-savings can add up. Like non-materialized views, materialized views have a number of benefits, including:
Performance. Speeds up heavy aggregation or processing complex queries, especially in the context of large datasets.
Maintenance. Snowflake performs maintenance of the views using a background service that is “less error-prone” and “more efficient” than if you were to manually maintain the data.
Complex data. Materialized views can efficiently process complex data, such as semi-structured data.
There are some limitations to materialized views, including:
Storage. Materialized views use storage space in order to store the pre-computed results and are therefore prone to additional storage costs.
Number of tables queried. The materialized view can query on only one table.
Join limitations. Joins—including self-joins—are not supported.
Snowflake Materialized View Cost
One of the major factors data engineers and data analysts have to consider is the cost implications of materialized views. Materialized views store pre-computed data; therefore, they incur storage costs. The Snowflake documentation even cautions users:
…[B]efore creating any materialized views, you should consider whether the costs are offset by the savings from re-using these results frequently enough.
Another important factor to note is each time rows are inserted into the source table, background maintenance occurs, and those new rows are then inserted into the materialized view. The below table outlines Snowflake credits per compute-hour charged for maintenance processes.
|Feature||Snowflake managed compute||Cloud services|
|Materialized views maintenance||10||5|
|Materialized views maintenance in secondary databases||2||1|
With these considerations in mind, it’s important to think about the difference between both view types and the use cases for each. The table below outlines some of the differences previously discussed.
|Aspect||Materialized Views||Non-Materialized (Regular) Views|
|Query Performance||Better performance due to pre-computed results||Slower; executes query on the fly|
|Storage Costs||Additional storage costs incurred for pre-computed data||No additional storage required|
|Compute Costs||Incurred during refresh; when the base table is updated, background maintenance occurs||Incurred each time the view is queried|
|Data Refreshes||Necessary to keep data up to date||No need for refreshes|
|Use Cases||Complex queries requiring performance
The source table has data that doesn’t frequently change
|Frequently changing data
Queries with results that you do not use often
The most important factor when deciding whether a materialized view is “worth it” comes down to the structure of the underlying data: do the results of the view frequently change, and do you use those results very often?
Create a Snowflake Materialized View
Going back to the video tutorials example, let’s consider a scenario where a materialized view would make more sense. Because the
user_video_activity table is loaded in real time, the data is constantly changing. Every time a user accesses a video, a new record is created, and the
total_watch_count metric is recalculated. The table adds a row for each instance of user activity, and those instances of activity are counted in the watch statistics.
Another table in the warehouse,
video_ratings_comments, contains video ratings as well as user feedback and comments and is updated once a day. You would like to create a materialized view that shows the top 10 videos by average rating.
CREATE MATERIALIZED VIEW top_rated_videos AS SELECT video_id, AVG(video_rating) AS average_rating FROM video_ratings_comments GROUP BY video_id ORDER BY average_rating DESC LIMIT 10;
Querying the materialized view would yield the following results.
SELECT * FROM top_rated_videos; +------------+-------------------+ | VIDEO_ID | AVERAGE_RATING | +------------+-------------------+ | 1001 | 4.8 | | 1005 | 4.7 | | 1100 | 4.7 | | 1003 | 4.6 | | 1007 | 4.5 | | 1002 | 4.4 | | 1008 | 4.3 | | 1004 | 4.2 | | 1009 | 4.1 | | 1006 | 4.0 | +------------+-------------------+
A materialized view is useful in this instance because it contains a small query result size (10 records). In addition, you avoid the issue of frequently changing data because the data is updated once a day, so the aggregations do not need to be constantly recalculated.
Optimizing Snowflake Performance and Cost
In the first example, you would not use a materialized view because the cost implications of a materialized view would likely outweigh its performance benefits—the data is constantly changing. In cases where materialized views come with cost considerations that outweigh their performance benefits, you can consider alternative strategies for optimizing both data warehouse performance and costs.
There are many available optimization strategies—with only a few referenced here.
- One option is to use the Query History feature in Snowsight to analyze execution times and warehouse load.
- You can also run an example query, such as the Top n longest-running queries to view the performance of queries over a designated timeframe.
- Consider using the Search Optimization Service to improve performance on certain query types.
- Consider using dynamic tables. When you need to transform data within your pipeline, a dynamic table can be a cost-effective option that also improves query performance. Unlike materialized views, dynamic tables can be based on complex queries, such as joins.
- Use query tags to map business costs to cloud spend, allowing you to track per-unit costs in Snowflake and aggregate query costs based on individual business units or services.
- You can review incurred Cloud Services costs and consider ways to reduce these costs, such as selectively cloning your databases for development, ETL, or backup rather than creating full clones.
In addition, consider using various cost optimization tools to gain insights into the categories that are driving costs, such as storage and compute. These tools can help you make informed decisions on cost-effective strategies and enhance the overall efficiency of your data warehouse.
The example below from the Vantage console illustrates the breakdown of accrued weekly costs. The data is grouped by categories, like Storage and Materialized Views. These analytics can empower you to make data-driven decisions and find the right balance between performance and cost-effectiveness in your Snowflake environment.
Views are valuable tools for streamlining data analysis as well as increasing data security. While materialized views can significantly boost query performance, they come with some potentially hefty cost implications. Therefore, it’s essential to carefully consider your specific use case. Striking the right balance between cost and performance is crucial. By leveraging Snowflake’s cost analysis features as well as other cost optimization tools, you can make informed decisions to ensure your data warehouse operates both efficiently and cost-effectively.