apiary-query 0.1.2-alpha

DataFusion-based SQL query engine for the Apiary distributed data processing framework
Documentation

apiary-query

DataFusion-based SQL query engine for the Apiary distributed data processing framework.

Overview

apiary-query wraps Apache DataFusion to provide SQL query capabilities over Apiary's Parquet-based storage:

  • ApiaryQueryContext — Wraps DataFusion's SessionContext with Apiary namespace resolution (hive.box.frame)
  • Custom SQL commandsUSE hive.box, SHOW HIVES, SHOW BOXES, SHOW FRAMES, and DESCRIBE frame
  • Cell pruning — Pushes WHERE predicates down to skip Parquet cells using column-level statistics
  • Projection pushdown — Reads only the columns needed by the query
  • Distributed queries — Cache-aware query planning that assigns cells to nodes based on locality and capacity

Usage

use apiary_query::ApiaryQueryContext;

let ctx = ApiaryQueryContext::new(storage, registry).await?;

// Standard SQL
let results = ctx.sql("SELECT region, SUM(amount) FROM sales GROUP BY region").await?;

// Custom commands
ctx.sql("USE my_hive.my_box").await?;
ctx.sql("SHOW FRAMES").await?;
ctx.sql("DESCRIBE my_frame").await?;

Supported SQL

  • SELECT with projections, filters, joins, and subqueries
  • Aggregations: GROUP BY, AVG, SUM, COUNT, MIN, MAX
  • USE hive.box to set the active namespace
  • SHOW HIVES, SHOW BOXES, SHOW FRAMES for discovery
  • DESCRIBE frame for schema inspection
  • DELETE and UPDATE are blocked with clear error messages (append-only model)

License

Apache License 2.0 — see LICENSE for details.

Part of the Apiary project.