When designing application data stores, it’s natural for developers and database experts to focus more on how the data is stored and less on how it is retrieved. While the format and type of data certainly affect the choice of a data store, querying and retrieval can present unique challenges as well.
For instance, if you want to retrieve a list of all sales of a given item in a store while excluding all other details tied to this item, you may have to query different entities whenever you need that data. A more efficient way of achieving the same outcome is through materialized views.
In this post, we’ll explain what a materialized view is and highlight its different use cases in your data management strategy, as well as best practices for using them. Let’s get started.
What is a materialized view?
Materialized views are pre-computed data sets responsible for replicating databases to other websites, eliminating the need to run expensive queries every time you retrieve data from a database. Since materialized views store their output data, querying them is faster than running queries directly on base relations (tables). Their quicker response means they expedite data analysis.
To better understand the value of materialized views, consider the following example:
Suppose you have three base tables: Staff, Items, and Sales. The Items table holds records of different items. The Sales table contains the sales records for each item using a foreign key, itemId, in the Sales table.
You can derive a materialized view with sales_by_item, which aggregates the total sales per item from either the Sales table or through a JOIN statement that references both the Items and Sales tables, in case you need more data in the view.
When to Use Materialized Views
In real-world systems, such as enterprise applications, materialized views are beneficial in four key areas:
- Easing network loads
- Creating a mass deployment environment
- Enabling data subsetting
- Enabling disconnected computing
Let’s explore each of these use cases to understand how and why materialized views support these goals.
Easing Network Loads
Organizations with operations in many locations rely on networks for communication and collaboration. This means they want to keep network traffic to a minimum, so as not to require costly network upgrades to accommodate increasing traffic loads.
Read-only materialized views can help organizations with multiple locations distribute network loads by replicating databases in various geographical areas closer to the organization's offices without introducing data conflicts in the source database.
Instead of an entire company accessing data on a single database server, multiple servers can be placed in different locations. Instead of duplicating actual data, which increases storage costs, materialized views operate from the distributed servers. This means faster response times to queries and less network traffic because queries go to the local server.
You update materialized views through an efficient batch process from a single source site or main materialized view site. They have low network requirements and dependencies because they replicate only a point in time, which requires only periodic refreshes.
Through the use of multitier materialized views, you can even create materialized views based on other materialized views. This lets you distribute user load to an even greater extent because clients can access materialized view sites instead of the source sites. A materialized view can be a subset of a source table, or a source materialized view to further decrease the amount of data replicated.
Creating Mass Deployment Environments
The most effective mass deployment tools and technologies help database administrators deliver database infrastructure and data rapidly across different sites while maintaining data quality.
Materialized views support mass deployments because they’re easy to define using deployment templates. Using deployment templates, you can specify the structure of a materialized view’s environment once and then implement the template in multiple sites.
It’s also possible to adjust template parameters, customizing each materialized view environment. This approach is instrumental when sharing specific data types with different data consumers.
Mass deployments through materialized views can rapidly distribute information to field technicians, remote inventory sites, retail stores, and mobile sales forces.
Enable Data Subsetting
You can use materialized views to replicate a subset of data from source database instances through row-level or column-level subsetting, effectively reducing the replication costs to site-specific data. The alternative replications would require complete table replication, which isn't efficient due to the lack of specificity required for the data.
Data subsetting allows copies containing only portions of the entire database or data specific to certain sites. For example, given an organizational database containing employees in multiple departments, you can use materialized views to replicate data specific to one department for internal use. You can enforce some degree of security through materialized views and subsetting by exposing relevant data to authorized persons only.
Enable Disconnected Computing
Since materialized views are derivations of (mostly) live databases, they don’t require dedicated network connections to work. The standard practice when using localized applications (on desktops or laptops) is manual refreshing of materialized views, even though it’s possible to automate the process through job scheduling. Manually refreshing your materialized view on-demand (as opposdd to using a data stream) is an ideal solution for sales applications running on a personal computer.
For example, a developer can integrate the replication management API for refresh on-demand into a sales application. When the salesperson has completed the day's orders, the salesperson simply dials up the network and uses the replication management API to transfer the orders to the primary office. A network connection is necessary during the refresh process only.
Best Practices for Materialized Views
As with standard queries, it’s necessary to optimize materialized views — especially for performance. The base relations of most databases are constantly changing, often in response to different application-level transactions.
With materialized views, performance can quickly become sub-optimal if the view requires constant updating to reflect changes in base relations. You can achieve this by redefining the initial expressions that generated the materialized views.
Still, such an approach has high cost and performance implications — especially when dealing with large-scale enterprise applications. Therefore, the strategies for updating materialized views must be efficient enough to mitigate potential performance issues.
Creating Materialized Views
When creating materialized views, consider prioritizing the resource-intensive processes. This enables these views to store the type of data that reduces the frequency of performing complex queries that would otherwise consume a lot of time and computing power.
Similarly, it’s possible to derive more than one materialized view from the same base relation. For example, one view can hold a table’s most recent data, while the second has unusual data from the same relation. Using a non-materialized view, you can join the two materialized views initially created. The resulting view can reveal data anomalies indicative of critical issues such as denial of service (DoS) attacks targeting databases, especially in live environments.
Consider building a materialized data view for unusual data only in databases where base relations are unclustered, or you already specified columns containing unusual data within the base tables’ clustering keys. In addition, you should consider materialized keys for unusual data only if it’s easy to isolate from a given relation and you use it regularly. Otherwise, the cost of maintaining materialized views for rarely-used data can outweigh the benefits.
Optimizing Base Table Operations
As noted earlier, a common challenge when working with materialized views is possible inconsistencies between them and the base table, which is regularly updated. To counter this issue, consider batching the data manipulation language (DML) operations on the base table.
For example, using the DELETE operation is common to trim old data from tables. If materialized views are based on such tables, updating them to reflect such changes is necessary. Frequent updates may increase background performance and storage costs. To contain such costs, consider executing batch delete operations on tables weekly or monthly or on a schedule that balances performance and price.
For other procedures such as INSERT, MERGE, and UPDATE, batching them on base relations in your databases can also reduce the maintenance costs for materialized views derived from such tables.
Beyond creating materialized views, you can also create view logs and custom on-demand view refreshes. There are far more complex operations you can define in Oracle and other databases for materialized views, as this handy guide demonstrates.
Streamline data analysis with materialized views.
The value materialized views add to database-driven systems can’t be overemphasized. They’re especially handy in reducing the performance bottlenecks and costs incurred through running native queries on databases, especially in production environments where stored data can be expansive.
Materialized views can help organizations powered by data-intensive applications by easing network loads through distributed database replicas, supporting the creation of mass deployment environments, facilitating data subsetting, and disconnected computing for localized applications.