Expand description
§datafusion-materialized-views
datafusion-materialized-views provides robust algorithms and core functionality for working with materialized views in DataFusion.
§Key Features
- Incremental View Maintenance: Efficiently tracks dependencies between Hive-partitioned tables and their materialized views, allowing users to determine which partitions need to be refreshed when source data changes. This is achieved via UDTFs such as
mv_dependenciesandstale_files. - Query Rewriting: Implements a view matching optimizer that rewrites queries to automatically leverage materialized views when beneficial, based on the techniques described in the paper.
- Pluggable Metadata Sources: Supports custom metadata sources for incremental view maintenance, with default support for object store metadata via the
FileMetadataandRowMetadataRegistrycomponents. - Extensible Table Abstractions: Defines traits such as
ListingTableLikeandMaterializedto abstract over Hive-partitioned tables and materialized views, enabling custom implementations and easy registration for use in the maintenance and rewriting logic.
§Typical Workflow
- Define and Register Views: Implement a custom table type that implements the
Materializedtrait, and register it usingregister_materialized. - Metadata Initialization: Set up
FileMetadataandRowMetadataRegistryto track file-level and row-level metadata. - Dependency Tracking: Use the
mv_dependenciesUDTF to generate build graphs for materialized views, andstale_filesto identify partitions that require recomputation. - Query Optimization: Enable the query rewriting optimizer to transparently rewrite queries to use materialized views where possible.
§Example
See the README and integration tests for a full walkthrough of setting up and maintaining a materialized view, including dependency tracking and query rewriting.
§Limitations
- Currently supports only Hive-partitioned tables in object storage, with the smallest update unit being a file.
- Future work may generalize to other storage backends and partitioning schemes.
§References
Modules§
- materialized
- Code for incremental view maintenance against Hive-partitioned tables.
- rewrite
- An implementation of Query Rewriting, an optimization that rewrites queries to make use of materialized views.
Structs§
- Materialized
Config - Configuration options for materialized view related features.