apiary-query
DataFusion-based SQL query engine for the Apiary distributed data processing framework.
Overview
apiary-query wraps Apache DataFusion to provide SQL query capabilities over Apiary's Parquet-based storage:
- ApiaryQueryContext — Wraps DataFusion's
SessionContextwith Apiary namespace resolution (hive.box.frame) - Custom SQL commands —
USE hive.box,SHOW HIVES,SHOW BOXES,SHOW FRAMES, andDESCRIBE frame - Cell pruning — Pushes WHERE predicates down to skip Parquet cells using column-level statistics
- Projection pushdown — Reads only the columns needed by the query
- Distributed queries — Cache-aware query planning that assigns cells to nodes based on locality and capacity
Usage
use ApiaryQueryContext;
let ctx = new.await?;
// Standard SQL
let results = ctx.sql.await?;
// Custom commands
ctx.sql.await?;
ctx.sql.await?;
ctx.sql.await?;
Supported SQL
SELECTwith projections, filters, joins, and subqueries- Aggregations:
GROUP BY,AVG,SUM,COUNT,MIN,MAX USE hive.boxto set the active namespaceSHOW HIVES,SHOW BOXES,SHOW FRAMESfor discoveryDESCRIBE framefor schema inspectionDELETEandUPDATEare blocked with clear error messages (append-only model)
License
Apache License 2.0 — see LICENSE for details.
Part of the Apiary project.