Expand description
§DataFusion-DuckLake
A DataFusion extension that adds support for DuckLake, an integrated data lake and catalog format.
§Overview
DuckLake uses:
- Catalog Database: SQL database (DuckDB, SQLite, PostgreSQL, MySQL) storing metadata as SQL tables
- Data Storage: Apache Parquet files stored on disk/object storage
This extension provides read-only access to DuckLake catalogs through DataFusion’s catalog and table provider interfaces.
§Example
use datafusion::prelude::*;
use datafusion_ducklake::{DuckLakeCatalog, DuckdbMetadataProvider};
// Create a DataFusion session context
let ctx = SessionContext::new();
// Create a DuckDB metadata provider
let provider = DuckdbMetadataProvider::new("path/to/catalog.ducklake")?;
// Register a DuckLake catalog with the provider
let catalog = DuckLakeCatalog::new(provider)?;
ctx.register_catalog("ducklake", std::sync::Arc::new(catalog));
// Query tables from the catalog
let df = ctx.sql("SELECT * FROM ducklake.main.my_table").await?;
df.show().await?;Re-exports§
pub use catalog::DuckLakeCatalog;pub use error::DuckLakeError;pub use metadata_provider::MetadataProvider;pub use schema::DuckLakeSchema;pub use table::DuckLakeTable;pub use table_functions::register_ducklake_functions;pub use metadata_provider_duckdb::DuckdbMetadataProvider;
Modules§
- catalog
- DuckLake catalog provider implementation
- column_
rename - Custom execution plan for renaming columns
- delete_
filter - Custom execution plan for filtering deleted rows
- encryption
- Encryption support for reading encrypted Parquet files in DuckLake.
- error
- Error types for the DuckLake DataFusion extension
- information_
schema - Information schema implementation for DuckLake catalog metadata
- metadata_
provider - metadata_
provider_ duckdb - path_
resolver - Path resolution utilities for DuckLake
- schema
- DuckLake schema provider implementation
- table
- DuckLake table provider implementation
- table_
changes - Table changes (CDC) functionality for DuckLake
- table_
deletions - Table deletions functionality for DuckLake
- table_
functions - User-Defined Table Functions (UDTFs) for DuckLake catalog metadata
- types
- Type mapping from DuckLake types to Arrow types