Documentation

Materialized Views

A database view is a database object that acts like a saved SQL statement and behaves like a table. When the view is accessed the SQL which underlies the view is executed and rows are returned for processing by other SQL statements. A materialized view is like a normal view but the results are stored in an actual table. This means that the SQL which underlies the view does not need to be executed, instead rows can be returned immediately. For example, a query that takes 11 minutes to execute as a normal view may be accessed in a few milliseconds when the view is materialized.

Many database management systems which support materialized views can incrementally update views when the underlying data changes. This is called refreshing the materialized view. LeapDB materialized views are asynchronously refreshed on demand (or on a schedule) which means that the contents of a materialized view may be stale with respect to the underlying data sources of the view. If desired, the view may be refreshed before read which will ensure that it is as up-to-date as possible before reading data.
‍

LeapDB supports two different types of materialized views.

COMPLETE refresh materialized views

COMPLETE refresh materialized views are the most flexible materialized view type. These materialized views are similar to database systems that support materialized views but do not support incrementally refreshable materialized views, such as PostgreSQL. When the materialized view is refreshed, the SQL statement is executed and the contents replace the contents of the view atomically when the refresh completes. This means that the contents of a COMPLETE refresh materialized view is available during refresh. This differs from some other databases with materialized views. In those systems, the view contents disappear during refresh because the table is truncated before being repopulated. COMPLETE refresh materialized views support all SELECT syntax so they are the most flexible, but they are expensive to refresh. A query that takes 11 minutes to execute will take 11 minutes to refresh.

Setup Instructions

Purchase the service from the marketplace.

Follow the set up instructions.

The remote MySQL (the ones LeapDB replicates data from) needs the following settings:

binlog_format = ROW
gtid-mode = ON
enforce_gtid_consistency = ON

You can change the above in RDS by editing the pameter group.

The remote MySQL needs a schema and user for LeapDB.

Here are the commands:

CREATE DATABASE leapdb;
CREATE TABLE leapdb.mview_signal ( signal_id bigint unsigned NOT NULL AUTO_INCREMENT, signal_time timestamp NULL DEFAULT NULL, PRIMARY KEY (signal_id) ) ENGINE=InnoDB;
CREATE USER leapdb@'%' IDENTIFIED BY '[Insert the password here]';
GRANT REPLICATION CLIENT, REPLICATION SLAVE on *.* to leapdb@'%';
GRANT INSERT ON leapdb.* to leapdb@'%';
GRANT CREATE TEMPORARY TABLES ON leapdb.* to leapdb@'%';
GRANT SELECT ON *.* to leapdb@'%';
GRANT LOCK TABLES on *.* to leapdb@'%';
GRANT RELOAD ON *.* to leapdb@'%';

Two-phase refresh mechanism

INCREMENTAL refresh algorithm supports two different refresh mechanisms, the COMPUTE mechanism and the APPLY mechanism. Each materialized view has an associated `viewname_delta` table where viewname is the name of the materialized view table. It is possible to COMPUTE changes periodically but not APPLY them to the view. This is convenient for example if you have a materialized view which you refresh once per day. By using a COMPUTE refresh periodically (for example every hour) most or all of the changes for the view will be pre-computed before you APPLY the refresh.

INCREMENTAL refresh materialized views

The more interesting type of materialized view is the INCREMENTAL refresh materialized view. These views are updated efficiently when refreshed based on the underlying rows changed in the view.

LeapDB includes a Change Data Capture component (LeapCDC) which can collect table change logs, also known as materialized view logs. The term change log and materialized view log are used interchangeabley. In order to create a INCREMENTAL refresh materialized view, a materialized view log must be created on each table used in the view.

Once a change log is created on a table, any row changes made in the source table (be it local or remote) will be collected into the table's change log. The logs are stored in the `leapdb` schema and their names are encoded. See the `leapdb`.`mvlogs` table to see which tables have materialized view logs on them and the table change log name for the table. Note that there must ALWAYS be a change log on the leapdb.mview_signal table, and this changelog is created at database installation.

After change logs are created for each table used in a SQL statement (this must be done only once for each table) the CREATE INCREMENTAL MATERIALIZED VIEW (see below) command can be used to materialize a query using the INCREMENTAL refresh algorithm. INCREMENTAL refresh materialized views only support INNER JOIN operations and support all of the MySQL aggregation functions except GROUP_CONCAT. It is possible to create COMPLETE refresh materialized views "on top of" INCREMENTAL refresh materialized views, and other INCREMENTAL materialized views, but care must be taken to refresh views used by other views to consistent points in time (see the refresh command below) in order to get correct results.

Refreshing materialized views

CALL leapdb.refresh2 ('schema', 'view_name');
CALL leapdb.refresh (mvid, 'COMPUTE'|'APPLY'|'BOTH', unit_of_work_id);
CALL leapdb.refresh_all('COMPUTE|APPLY|BOTH', NULL)

The second call, refresh is the more flexible. The first parameter is the materialized view id for the materialized view you wish to refresh. You can obtain this id either by examining the `leapdb`.`mview` table or by using the leapdb.get_id('schema','view_name') function. The second parameter is the refresh algorithm to use. If you specify COMPUTE then a COMPUTE type refresh will be executed and the computed rows will be placed into the materialized view's delta table. The 'APPLY' algorithim will only apply already computed changes stored in the _delta table to the view. 'BOTH' will both COMPUTE and APPLY the view. The third parameter may be NULL, if it is, the latest committed transaction that has been captured will be used for the refresh algorithm. If it i not NULL the refresh will be done to that specific unit_of_work_id.

The first call, to refresh2 is the simpler of the two. It takes a schema name and materialized view name and applies the BOTH algorithm up to the latest point in time.

The third call, refresh_all refreshes all

Purging Materialized View Logs

Materialized view logs must be purged of data that is no longer needed to refresh materialized views. This does not happen by default. But only when the leapdb.purge_mvlogs stored procedure is executed.

CALL leapdb.purge_mvlogs();

The default retention_interval = 10 days

Having longer log retention wastes space and could slow down refresh in materialized views. On the other hand, if you choose too short a log retention and you purge without first refreshing your incrementally refreshable materialized views, and you lose those changes, and you will need to drop and create the materialized views.

To override the retention_interval to 14 days for example run this command:

UPDATE leapdb.mview_settings SET '14 days' WHERE setting_value = 'retention_interval' or whatever interval you prefer.

You can schedule the purging to happen regularly by creating a mysql event. For example:
‍

CREATE EVENT leapdb_purge_event
ON SCHEDULE
EVERY 1 MINUTE
COMMENT "Leapdb mv log purge"
DO CALL leapdb.purge_mvlogs();
‍

UNIT OF WORK ID

The third parameter is the "unit of work id" which is a transaction identifier in the LeapDB database system. The closest unit_of_work to a specified timestamp can be obtained using the leapdb.uow_from_dtime ('YYYY-MM-DD HH:MM:SS') function. For example to get the most recent transaction id collected, use leapdb.uow_from_dtime (CURRENT_TIMESTAMP()); You may examine the leapdb.mview_uow table to obtain other unit of work identifiers.

Remote Queries / Database Links

If you want to manually set up additional remote servers, here are the two steps in either order:

Create the foreign wrapper

LeapDB supports table level replication, which is implemented using remote queries. A remote query is a query that accesses a single database link. A database link is created with the CREATE SERVER statement using the "mysql" foreign data wrapper. For example:
CREATE SERVER <database_link_name> FOREIGN DATA WRAPPER mysql OPTIONS( HOST '192.168.1.100', USER 'leapdb', PASSWORD 'a password' );

Remote queries use the @DATABASE_LINK syntax. A query that uses database links may access only other tables on the same link. It is not currently possible to access a mix of local and remote queries in a single.

During LeapDB server set up a remote server can be set up. This remote server is configured for use with LeapCDC and creates a database link for usage in queries.

Run the commands from the setup process on the remote server.

Setting up additional CDC clients

Execute the script /usr/local/leapdb/scripts/setup_remote_cdc.sh

It will prompt you for username password hostname details of the server you want to CDC (change data capture) data from, and then the script will automatically start the CDC and schedule a cron job so that it will restart on a reboot.

The foreign wrapper step is to enable the syntax for remote queries to that server and the setting up additional CDC clients step is so that LeapDB can use CDC to populate your materialized view logs.
‍
‍

Remote query example

SELECT D_Year, count(*) cnt_alias
FROM ssb.lineorder@DATABASE_LINK
JOIN ssb.dim_date@DATABASE_LINK
WHERE OrderDateKey between 19980101 and 20010101
GROUP BY D_Year;

MATERIALIZED VIEW LOG SYNTAX

CREATE MATERIALIZED VIEW LOG ON example_schema.example_table@remote;

Tips:

1) Every table in a materialized view needs a view log created for it.
2) You have to do this before you create the actual incremental materialized view.
3) You need to do this on remote tables and also on local tables.
4) If you create a materialized view that you are using to create another materialized view, you also have to make the log.
‍

MATERIALIZED VIEW SYNTAX

To create a COMPLETE refreshable materialized view the syntax is:
CREATE MATERIALIZED VIEW <view_name> AS
SELECT ... ;

To create a INCREMENTAL refreshable materialized view the syntax is:
CREATE INCREMENTAL MATERIALIZED VIEW <view_name> AS
SELECT ... ;

Replicating a table with a remote materialized view

CREATE INCREMENTAL MATERIALIZED VIEW example
AS
SELECT column1, column2, column3
FROM example_schema.example_table@remote;

Note that while the syntax will allow it, one must not incrementally materialize remote queries with joins. These queries should use either COMPLETE refresh or you should replicate each table using a remote query, then create a new materialized view which joins the replicated tables together! In the future LeapDB may prevent remote queries with joins from being incrementally materialized.

Note that the keyword "remove" above is the default name of the remote system. You can give it a different name.

DROP MATERIALIZED VIEW SYNTAX

DROP MATERIALIZED VIEW example_schema.example_table@remote;

This drops a materialized view.

‍

DROP MATERIALIZED VIEW LOG SYNTAX

DROP MATERIALIZED VIEW LOG example_schema.example_table@remote;

This drops a materialized view log.

‍

DROP DATABASE SYNTAX

CALL leapdb.drop_database('test');
‍
This drops drops a schema and also all of the special leapdb processes associated with a schema. If it's a normal schema without any materialized views and without any materialized view logs, then you can use the normal DROP DATABASE test; But for a leapdb schema you need to use this command so that it's completely dropped.

‍

RENAME MATERIALIZED VIEW SYNTAX

RENAME MATERIALIZED VIEW from_schema.from_table to to_schema.to_table
‍
This renames a materialized view.

‍

SHOW MATERIALIZED VIEW LOGS SYNTAX

show materialized view logs;

Before you make a materialized view, you need to have materialized view logs for all of the base tables. This command provides a handy list of the logs that you are currently maintaining in the current schema.

You can access the mvlogs table in the leapdb schema to see all of the logs from all schemas.

‍

SHOW MATERIALIZED VIEW SYNTAX

SELECT leapdb.get_sql( leapdb.get_id('the schema', 'the materialized view name') );

‍

LIMITATIONS

LEFT JOINS, RIGHT JOINS, and OUTER JOINS are not supported. If you need to materialize a query with a left join you need to either find a way to avoid the left join or maybe you can create a materialized view without the join and then left join into the materialized view. Inner joins are supported.

GROUP_CONCAT AVERAGE, MIN, MAX are supported but take more space and can be a little bit slower than the other aggregate functions.

DISTINCT is not supported. But you can use GROUP BY wherever you would use DISTINCT.

The query for the materialize view must be deterministic which means you cannot use functions that return different results each time they run like NOW() and RAND() etc.

Sub queries work but it's complicated. We recommend that you either understand how they work or avoid them. Sub queries to base tables are not supported although it will work and give correct results so long as the base table does not change. Sub queries to other materialized views will work but you need to refresh those other materialized views before refreshing your materialize view and the cascade update will not do that for you.

Comments with SQL hints are ignored.

Aggregate functions must stand alone although u can put formulas inside of them. For example, SUM(a * b) is good. SUM(a) * SUM(b) is bad.

\e "Edit command with $EDITOR" on the mysql client can only support single leapdb statements because of a peculiarity of the mysql client.

If you kill a "create materialized view" statement you will need to run a "drop materialized view" statement before re-running the create materialized view statement.

You can have multiple tables in a view but only one remote table and so you must create materialized views out of your remote tables if you need all of those remote tables in a single materialized view. The trick is to make a materialized view out of materialized views.

In order to skip one or more transactions on the source server, you can manually update the gtids and gtid fields on the leapdb.binlog_consumer_status. This operation should only be done if some statement is causing an error in the binary log consumer (LeapCDC).

DateTime values cannot be 0000-00-00. This is not a valid date in the Gregorian calendar, so, hopefully you don't need it. Often you'll be better served with NULL. But, regardless this value is not allowed in LeapDB remote tables.

Support will always help you with limitations.
‍

Documentation

- Materialized Views

- Setup Instructions

- COMPLETE refresh materialized views

- INCREMENTAL refresh materialized views

- Two-phase refresh mechanism

- LeapDB has two different refresh functions

- Purging Materialized View Logs

- UNIT OF WORK ID

- Remote Queries / Database links

- Remote query example

- MATERIALIZED VIEW SYNTAX

- Replicating a table with a remote materialized view

- MATERIALIZED VIEW LOG SYNTAX

- DROP MATERIALIZED VIEW SYNTAX

- DROP MATERIALIZED VIEW LOG SYNTAX

- DROP DATABASE SYNTAX (for schemas with materialized views)

- RENAME MATERIALIZED VIEW SYNTAX

- SHOW MATERIALIZED VIEW LOGS SYNTAX

- SHOW MATERIALIZED VIEW SYNTAX

- LIMITATIONS