Snowflake Summit 2023: Dynamic Tables

Teepika R M
6 min readJul 8, 2023

Snowflake: Revolutionizing Data Storage, Processing, and Analytics in the Cloud

Snowflake is not your ordinary traditional data warehouse. It is a cloud-native platform that brings together flexibility and high performance to transform how data is stored, processed, and utilized for complex analytics. By leveraging the power of the cloud, Snowflake eliminates the need for organizations to invest in and maintain their own infrastructure, reducing upfront costs and operational complexities.

One of the standout features of Snowflake is its seamless integration with popular cloud providers such as AWS, Azure, and Google Cloud. This allows users to leverage their preferred cloud infrastructure and take advantage of the scalability and reliability offered by these platforms. With Snowflake, you can effortlessly handle the ever-increasing volumes of data generated by businesses without the need for significant upfront investments.

One of the key architectural advantages of Snowflake is its unique separation of storage and compute layers. Unlike traditional systems where the scaling of one component is tightly coupled with the other, Snowflake decouples storage and compute resources. This means that compute resources can be dynamically provisioned based on the specific requirements of your workload, regardless of the data volume. This flexibility ensures optimal performance and cost efficiency, allowing you to scale compute resources independently as needed.

Snowflake’s one of the key architectural advantages — Separation of storage and compute

What is Snowflake?

To be short and crisp, Snowflake is Cloud data warehouse and analytics platform

Databricks and Snowflake, why these dynamic duos are capturing the attention of the industry?

Both Databricks and Snowflake are built for the cloud and comes with inherent advantages of cloud system such as scalability, elasticity, and cost-effectiveness. They both offer user-friendly interfaces and make the transition for the team from traditional to cloud platforms as effortless as possible. Also, they seamlessly integrate with a wide range of data sources, analytics tools, and machine learning frameworks. Databricks and Snowflake embody the perfect fusion of cutting-edge cloud technologies, user-friendly experiences, and seamless interoperability. It makes them a popular choice for smoothing out interoperability with other systems. These platforms gained significant attention as a reflection of their growing importance.

Dynamic Tables — An Interesting feature released as part of Snowflake Summit

Dynamic Table Highlights

Dynamic tables let the developers define data processing requirements as data transformation queries without explicitly specifying the schema in advance. It gives at-most flexibility while handling dynamic data requirements. Picture this: the underlying data continuously changes its format, schema, or data source — with a dynamic table, developers are free from making code changes to accommodate these modifications. It is well suited for handling diverse data effectively. With Dynamic tables, we can simply define the transformation logic for the desired resultset and the system will itself handle the schedules, dependencies, and execution of tasks to attain the output.

To put it simply, the query given for dynamic table is materialized with the result set of the given query for creating the table. The tables are defined with regular refreshes that are automatically done based on the at-most lag that can be tolerated in reflecting the changes. Since the dynamic tables are born from the queries defined, we cannot subsequently change the table content using any DML (insert, update or delete rows). But fear not, automated refreshes materialize the dynamic tables to reflect the changes in source tables.

-- Create a landing table to store
-- raw JSON data.
CREATE OR REPLACE TABLE raw
(var VARIANT);

-- Create a dynamic table containing the
-- names of office visitors from
-- the raw data.
-- Try to keep the data up to date within
-- 1 minute of real time.
CREATE OR REPLACE DYNAMIC TABLE names
TARGET_LAG = '1 minute'
WAREHOUSE = mywh
AS
SELECT var:id::int id, var:fname::string first_name,
var:lname::string last_name FROM raw;

The above query is meant to create a dynamic table with its data being populated from the source table “raw”. The SELECT query fetches the id, first_name and last_name from the table “raw” to retrieve the office visitor details. The table dynamic tables is automatically refreshed for every one minute to keep itself in par with its source table.

Immediate question that pops in mind “How it differs from materialized views”

In data systems, views are commonly used to store frequently accessed data involving complex join and aggregation operations, avoiding the need for repetitive and unnecessary computations. Materialized views are similar to views but with one key distinction: the data underlying the definition of materialized views is stored on disk, allowing for faster data retrieval. In Snowflake, both materialized views and dynamic tables aim to enhance query performance by optimizing data access and processing. However, there are three important differences that set materialized views apart from dynamic tables in Snowflake.

  • Materialized views are designed to enhance query performance seamlessly, with optimization applied automatically in the background. This means that you don’t need to explicitly specify the materialized views in your queries. The query optimizer can automatically rewrite queries to utilize the materialized views, resulting in improved performance. On the other hand, dynamic tables are designed to effortlessly handle streaming data changes, such as new inserts, updates, and deletes, in a data pipeline. They provide a flexible and efficient way to incorporate real-time data updates into your data processing workflows.
  • Materialized views can be defined based on a single base table, while dynamic tables can be created from complex queries that involve multiple tables with joins and aggregations.
  • Materialized views are continuously updated to match the data in the source tables, ensuring they are always current. On the other hand, dynamic tables reflect changes from the source tables based on the specified target lag time.

Understanding how dynamic table refresh happens to keep itself updated?

What is dynamic table refreshes? The underlying database object reflection on the dynamic tables.

There are two approaches for refreshing dynamic tables: Incremental and Full. Whenever feasible, incremental refreshes are utilized, specifically for queries that support this type of refresh. The automated refresh process analyzes the table queries and calculates the changes, which are then merged into the dynamic table. In cases where incremental refresh is not applicable, a full refresh is performed. During a full refresh, the automated process computes the results for the dynamic table query and replaces the current result entirely with the newly computed result set.

Snapshot isolation maintained at refresh for related tables

Snapshot Isolation

Snapshot isolation is a concept in database systems that ensures the consistency and integrity of data during concurrent transactions. It provides each transaction with a consistent view of the database as if it were operating on a snapshot of the data taken at the start of the transaction, regardless of other concurrent transactions that may be modifying the data.

With respect to dynamic tables, snapshot isolation is maintained to ensure the data remains consistent and isolated during the refresh process. When a dynamic table is refreshed, it undergoes a process of updating its data based on the changes in the source tables. This refresh process can involve reading and modifying data concurrently with other transactions accessing the same tables. The automatic refresh process refreshes the dynamic tables in a way that the data is consistent across the whole dynamic tables chain. It chooses a schedule that matches with the target time lag of all the target and source dynamic tables. This helps in querying the same “snapshot” of data at any given point of time.

--

--

Teepika R M

AWS Certified Big Data Specialty| Linux Certified Kubernetes Application Developer| Hortonworks Certified Spark Developer|Hortonworks Certified Hadoop Developer