datafusion_materialized_views/lib.rs
1// Licensed to the Apache Software Foundation (ASF) under one
2// or more contributor license agreements. See the NOTICE file
3// distributed with this work for additional information
4// regarding copyright ownership. The ASF licenses this file
5// to you under the Apache License, Version 2.0 (the
6// "License"); you may not use this file except in compliance
7// with the License. You may obtain a copy of the License at
8//
9// http://www.apache.org/licenses/LICENSE-2.0
10//
11// Unless required by applicable law or agreed to in writing,
12// software distributed under the License is distributed on an
13// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
14// KIND, either express or implied. See the License for the
15// specific language governing permissions and limitations
16// under the License.
17
18#![deny(missing_docs)]
19
20//! # datafusion-materialized-views
21//!
22//! `datafusion-materialized-views` provides robust algorithms and core functionality for working with materialized views in [DataFusion](https://arrow.apache.org/datafusion/).
23//!
24//! ## Key Features
25//!
26//! - **Incremental View Maintenance**: Efficiently tracks dependencies between Hive-partitioned tables and their materialized views, allowing users to determine which partitions need to be refreshed when source data changes. This is achieved via UDTFs such as `mv_dependencies` and `stale_files`.
27//! - **Query Rewriting**: Implements a view matching optimizer that rewrites queries to automatically leverage materialized views when beneficial, based on the techniques described in the [paper](https://dsg.uwaterloo.ca/seminars/notes/larson-paper.pdf).
28//! - **Pluggable Metadata Sources**: Supports custom metadata sources for incremental view maintenance, with default support for object store metadata via the `FileMetadata` and `RowMetadataRegistry` components.
29//! - **Extensible Table Abstractions**: Defines traits such as `ListingTableLike` and `Materialized` to abstract over Hive-partitioned tables and materialized views, enabling custom implementations and easy registration for use in the maintenance and rewriting logic.
30//!
31//! ## Typical Workflow
32//!
33//! 1. **Define and Register Views**: Implement a custom table type that implements the `Materialized` trait, and register it using `register_materialized`.
34//! 2. **Metadata Initialization**: Set up `FileMetadata` and `RowMetadataRegistry` to track file-level and row-level metadata.
35//! 3. **Dependency Tracking**: Use the `mv_dependencies` UDTF to generate build graphs for materialized views, and `stale_files` to identify partitions that require recomputation.
36//! 4. **Query Optimization**: Enable the query rewriting optimizer to transparently rewrite queries to use materialized views where possible.
37//!
38//! ## Example
39//!
40//! See the README and integration tests for a full walkthrough of setting up and maintaining a materialized view, including dependency tracking and query rewriting.
41//!
42//! ## Limitations
43//!
44//! - Currently supports only Hive-partitioned tables in object storage, with the smallest update unit being a file.
45//! - Future work may generalize to other storage backends and partitioning schemes.
46//!
47//! ## References
48//!
49//! - [Optimizing Queries Using Materialized Views: A Practical, Scalable Solution](https://dsg.uwaterloo.ca/seminars/notes/larson-paper.pdf)
50//! - [DataFusion documentation](https://datafusion.apache.org/)
51
52/// Code for incremental view maintenance against Hive-partitioned tables.
53///
54/// An example of a Hive-partitioned table is the [`ListingTable`](datafusion::datasource::listing::ListingTable).
55/// By analyzing the fragment of the materialized view query pertaining to the partition columns,
56/// we can derive a build graph that relates the files of a materialized views and the files of the tables it depends on.
57///
58/// Two central traits are defined:
59///
60/// * [`ListingTableLike`](materialized::ListingTableLike): a trait that abstracts Hive-partitioned tables in object storage;
61/// * [`Materialized`](materialized::Materialized): a materialized `ListingTableLike` defined by a user-provided query.
62///
63/// Note that all implementations of `ListingTableLike` and `Materialized` must be registered using the
64/// [`register_listing_table`](materialized::register_listing_table) and
65/// [`register_materialized`](materialized::register_materialized) functions respectively,
66/// otherwise the tables may not be detected by the incremental view maintenance code,
67/// including components such as [`FileMetadata`](materialized::file_metadata::FileMetadata),
68/// [`RowMetadataRegistry`](materialized::row_metadata::RowMetadataRegistry), or the
69/// [`mv_dependencies`](materialized::dependencies::mv_dependencies) UDTF.
70///
71/// By default, `ListingTableLike` is implemented for [`ListingTable`](datafusion::datasource::listing::ListingTable),
72pub mod materialized;
73
74/// An implementation of Query Rewriting, an optimization that rewrites queries to make use of materialized views.
75///
76/// The implementation is based heavily on [this paper](https://dsg.uwaterloo.ca/seminars/notes/larson-paper.pdf),
77/// *Optimizing Queries Using Materialized Views: A Practical, Scalable Solution*.
78pub mod rewrite;
79
80/// Configuration options for materialized view related features.
81#[derive(Debug, Clone)]
82pub struct MaterializedConfig {
83 /// Whether or not query rewriting should exploit this materialized view.
84 pub use_in_query_rewrite: bool,
85}
86
87impl Default for MaterializedConfig {
88 fn default() -> Self {
89 Self {
90 use_in_query_rewrite: true,
91 }
92 }
93}