min read

Introduction to materialized views

An introduction to materialized views

The output of any SQL query is logically an in-memory table
The output of a SQL query, and what we normally think of as a result set is really a virtual table. It has columns and rows, just like a database table, but it is temporary in nature, usually existing only in memory for a short time. This concept extends from relational algebra, upon which SQL is built. All SQL can be broken down into relational algebra, which is convenient because that means there are lots of transformations that can be done to it, all without changing the meaning or the output. The output of all relational algebraic statements is a table, so conceptually so too is the output of a SQL statement.

MySQL even includes a SQL statement that makes this perfectly clear: CREATE TABLE .. AS SELECT (CTAS). The results of the SELECT portion of the statement are stored in a table. Storing the results of a SQL statement into a table (even a temporary table) is called materializing the results.

Views are similar to regular SQL statements.
A view is a SQL statement which acts like a table. When you execute a query on a view, the result set is ephemeral, generated at run-time for consumption and then immediately discarded.

Views are not generally considered to be a performance optimization because:

The contents of the result set are computed each time the view is accessed.
If multiple statements access the same view repeatedly or concurrently, then this computation is likely to be expensive
If the view accesses large quantities of data, then the computation is likely expensive.
Views containing grouping, aggregation, sorting, distinct or other conditions must be fully computed and stored in a temporary table before they can be accessed, which is very expensive.

What is a materialized view (MV)?
A materialized view is similar to a regular view, in that it represents the result set of a query, but the contents are stored (materialized!) as a real table.Â So a MV is similar to a table created with the CTAS command described above. This similarity is fairly superficial though. While both store results in a table, the MV represents the results of a SQL statement at a specific point in time, essentially a consistent snapshot of the query result. It is not possible to create multiple different tables via CTAS and have them all be transactionally consistent with one another, unless you stop all database write activity.

Materialized views can be used to enhance performance by acting as a cache. Further, the cost of a cache miss is lower because incremental refresh is faster than recomputing the contents from scratch.

The contents of the result set are updated periodically, not each time the view is accessed.
If multiple statements access the same view repeatedly or concurrently it is not likely to be very expensive.
If the view is large, accessing the table will be considerably cheaper
You can add indexes to the MV.
Since the data is already joined together and pre-aggregated, CPU and memory usage may be reduced compared to computing the results.
You can often generate multiple different blocks of content from the summarized data in one or more views

Materialized views must be refreshed.
Refreshing a MV brings it up to date to reflect the changes in the database since the view was either first created or last refreshed, whichever is more recent. More importantly a MV can can be refreshed to a specific point in time, not just “now”. This means that you can maintain multiple MVs and keep them synced to the same point in time.

‍

There are two different methods by which a MV can be refreshed:

The first is the complete refresh method, which rebuilds the entire contents of the view from scratch. This is the less desirable method:

During a complete refresh, the view contents are completely recalculated, which could be very expensive
On some databases, the contents of the view are not available during complete refresh
During refresh the view may occupy twice as much space (similar to ALTER TABLE)
Supports all SQL syntax (like outer join) but can’t be refreshed to a specific point in time.
Performs no better than CTAS, but gives a convenient method of creating and refreshing the materialized results

The second is refresh method is incremental refresh. This method updates the view. It usually examines only the rows which have changed since the view was last refreshed.

Incremental refresh has obvious benefits which include:

Refreshing large views is orders of magnitude faster than complete refresh.
When updating the view, only a subset of the database rows must be examined
The rows examined are related to the rows which have changed since the last refresh
The view can be refreshed forward to a specific transactional point in time
Multiple views can be rolled forward to the exact same consistent point in time, with respect to committed transactions
The processing can be done on a dedicated server.

And some drawbacks:

Not all SQL syntax supported (no outer join), no non-deterministic functions, etc.
There is some (low) overhead for change-data capture.
Some extra storage is used for the changes and deltas.

‍

Latest Articles

View All Articles

Star schema optimization in LeapDB

LeapDB includes "star schema optimization" which is a technique for improving query performance on star schema queries that filter out rows based on filters in dimension tables. This reduces IO for star schema queries and provides opportunities for query parallelization.

Tech

5

min read

Using a materialized view to accelerate multiple queries on a large dataset

I use my benchmark queries on the BTS Ontime data set in a materialized view context and compare performance of the queries when using the materialized view against querying the base table.

Query Optimisation

8

min read

WARP vs InnoDB on a wide 75GiB table

A comparison between InnoDB and WARP on two decades of the BTS ontime flight data, a 109 column wide table.

Tech

8

min read

Introduction to materialized views

Justin Swanhart

Latest Articles

Star schema optimization in LeapDB

Using a materialized view to accelerate multiple queries on a large dataset

WARP vs InnoDB on a wide 75GiB table