Bazof
Query tables in object storage as of event time.
Bazof is a lakehouse format with time-travel capabilities.
Project Structure
The Bazof project is organized as a Rust workspace with multiple crates:
- bazof: The core library providing the lakehouse format functionality
- bazof-cli: A CLI utility demonstrating how to use the library
- bazof-datafusion: DataFusion integration for SQL queries
Getting Started
To build all projects in the workspace:
Using the CLI
The bazof-cli provides a command-line interface for interacting with bazof:
# Scan a table (current version)
# Scan a table as of a specific event time
DataFusion Integration
The bazof-datafusion crate provides integration with Apache DataFusion, allowing you to:
- Register Bazof tables in a DataFusion context
- Run SQL queries against Bazof tables
- Perform time-travel queries using the AsOf functionality
Example
use BazofTableProvider;
use *;
async
Run the example:
If you install the CLI with cargo install --path crates/bazof-cli, you can run it directly with:
Project Roadmap
Bazof is under development. The goal is to implement a data lakehouse with the following capabilities:
- Atomic, non-concurrent writes (single writer)
- Consistent reads
- Schema evolution
- Event time travel queries
- Handling late-arriving data
- Integration with an execution engine
Milestone 0
- Script/tool for generating sample kv data set
- Key-value reader
- DataFusion table provider
Milestone 1
- Multiple columns support
- Single row, key-value writer
- Document spec
- Delta -> snapshot compaction
- Metadata validity checks
Milestone 2
- Streaming in scan
- Schema definition and evolution
- Late-arriving data support