Expand description
This module implements a dependency analysis algorithm for materialized views, heavily based on the ListingTableLike trait.
Note that materialized views may depend on tables that are not ListingTableLike, as long as they have custom metadata explicitly installed
into the RowMetadataRegistry. However, materialized views themself must implement ListingTableLike, as is
implied by the type bound Materialized: ListingTableLike.
The dependency analysis in a nutshell involves analyzing the fragment of the materialized view’s logical plan corresponding to partition columns (or row metadata columns more generally). This logical fragment is then used to generate a dependency graph between physical partitions of the materialized view and its source tables. This gives rise to two natural phases of the algorithm:
- Inexact Projection Pushdown: We aggressively prune the logical plan to only include partition columns (or row metadata columns more generally) of the materialized view and its sources.
This is similar to pushing down a top-level projection on the materialized view’s partition columns. However, “inexact” means that we do not preserve duplicates, order,
or even set equality of the original query.
- Formally, let P be the (exact) projection operator. If A is the original plan and A’ is the result of “inexact” projection pushdown, we have PA ⊆ A’.
- This means that in the final output, we may have dependencies that do not exist in the original query. However, we will never miss any dependencies.
- Dependency Graph Construction: Once we have the pruned logical plan, we can construct a dependency graph between the physical partitions of the materialized view and its sources.
After step 1, every table scan only contains row metadata columns, so we replace the table scan with an equivalent scan to a
RowMetadataSourceThis operation also is not duplicate or order preserving. Then, additional metadata is “pushed up” through the plan to the root, where it can be unnested to give a list of source files for each output row. The output rows are then transformed into object storage paths to generate the final graph.
The transformation is complex, and we give a full walkthrough in the documentation for mv_dependencies_plan.
Functions§
- mv_
dependencies - A table function that, for a given materialized view, lists all the output data objects (build targets) generated during its construction or refresh, as well as all the source data objects (dependencies) it relies on.
- mv_
dependencies_ plan - Returns a logical plan that, when executed, lists expected build targets for this materialized view, together with the dependencies for each target.
- stale_
files - A table function that shows which files need to be regenerated.
Checks
last_modifiedtimestamps from the file metadata table and deems a target stale if any of its sources are newer than it.