rig-ballista
Apache Ballista + DataFusion + Iceberg companion crate for rig-compose. Scaffolding; iceberg-rust integration pending toolchain verification.
Overview
rig-ballista is the metadata-catalog seam for future Ballista, DataFusion, and Iceberg integration. The crate currently ships a domain-neutral MetadataCatalog<S> trait, file-stat data types, storage errors, and a thread-safe InMemoryMetadataCatalog<S> reference implementation.
It does not yet depend on Ballista, DataFusion, Iceberg, or rig-compose; those dependencies remain planned and commented in Cargo.toml until the upstream combination is verified on stable Rust.
Why It Exists
Agent planners need a way to prune column-store files using per-file sketches before paying to materialize data. That pruning seam should not force every rig-compose consumer to compile Ballista, Arrow, Parquet, DataFusion, and Iceberg, nor should those crates leak into the public API.
rig-ballista isolates that future query-engine boundary behind MetadataCatalog<S>, where S is the caller's own sketch type.
Status
- Crate version:
0.1.0. - Rust edition: 2024.
- MSRV: 1.88 during the placeholder phase.
- Status: scaffolding.
MetadataCatalog<S>andInMemoryMetadataCatalog<S>ship today; the Iceberg plus Ballista-backed catalog is pending toolchain verification. PlaceholderCatalogis deprecated and retained only for source compatibility with the initial scaffolding release.- No direct
rig-coreorrig-composedependency currently exists. Planned dependencies are intentionally commented out in Cargo.toml.
Feature Flags
rig-ballista currently defines no feature flags. just check still runs clippy, tests, and docs with --all-features so newly introduced flags are covered by default.
Key Types
- src/catalog.rs:
MetadataCatalog<S>, the async read-only catalog trait. It exposeslist_files(partition)andget(file_id). - src/catalog.rs:
FileId, a stable UUID-backed file identifier. - src/catalog.rs:
FileStats<S>, the per-file partition tag plus caller-defined sketch payload. - src/catalog.rs:
StorageError, includingNotFound(FileId)andBackend(#[source] Box<dyn Error + Send + Sync>)with theStorageError::backend(err)constructor. - src/catalog.rs:
InMemoryMetadataCatalog<S>, aDashMap-backed reference implementation for tests and offline harnesses. - src/lib.rs:
PlaceholderCatalog, deprecated since0.1.1in favor ofInMemoryMetadataCatalog.
The public surface deliberately keeps Iceberg, Ballista, DataFusion, Arrow, and object-store types out of signatures.
Integration With Rig
Today, integration is architectural rather than dependency-level: downstream rig-compose agents can depend on MetadataCatalog<S> to prune files by caller-owned sketch type. The future Ballista/Iceberg implementation is expected to plug into the same trait without changing agent-facing code.
Because rig-ballista currently has no rig-core or rig-compose dependency, there is no pinned Rig version to report.
Usage
The catalog behavior is covered by unit tests in src/catalog.rs, including partition filtering, direct lookup, and empty/not-found behavior.
use ;
# async
Validation
Canonical validation is just check.
That recipe runs formatter checks, cargo clippy --all-targets --all-features -- -D warnings, cargo test --all-targets --all-features, and rustdoc with all features and -D warnings -D rustdoc::broken_intra_doc_links.
Gotchas
- The crate is intentionally not the real Ballista/Iceberg integration yet. Do not add those dependencies until the verification crate confirms the combination compiles on stable.
- The eventual integration may raise MSRV to whatever the upstream query stack requires, shipped as a breaking
feat!:change. MetadataCatalog<S>is generic over caller-owned sketches; the crate does not prescribe HLL, variance, histograms, or any other sketch shape.StorageError::Backendpreserves the original source error chain; useStorageError::backend(err)instead of stringifying backend failures.
Ecosystem
These companion crates are maintained as separate repositories. Together they form a small stack around the upstream Rig project: rig-compose provides the kernel surface, rig-resources contributes reusable skills and tools, rig-mcp moves tools across MCP, rig-memvid connects Rig agents to persistent .mv2 memory, and rig-ballista reserves the metadata-catalog seam for future query-engine integration.
flowchart TD
rig["rig / rig-core"]
compose["rig-compose 0.1.x"]
resources["rig-resources 0.1.x"]
mcp["rig-mcp 0.1.x"]
memvid["rig-memvid 0.1.x"]
ballista["rig-ballista 0.1.x"]
compose -. "Rig-shaped kernel; no direct rig-core dep" .-> rig
resources -- "rig-compose = 0.1; features: security, graph, full" --> compose
mcp -- "rig-compose = 0.1; rmcp stdio bridge" --> compose
memvid -- "rig-core = 0.36.0; features: lex, vec, api_embed, temporal, encryption" --> rig
ballista -. "planned rig-compose catalog integration; no direct dep today" .-> compose
Pinned Rig-facing dependencies from the current manifests:
| Crate | Direct Rig-facing dependency | Notes |
|---|---|---|
rig-compose |
none | Defines a Rig-shaped kernel surface without depending on rig-core. |
rig-resources |
rig-compose = 0.1 |
Uses a sibling path during local workspace development. |
rig-mcp |
rig-compose = 0.1 |
Uses a sibling path during local workspace development. |
rig-memvid |
rig-core = 0.36.0 |
Implements Rig vector-store and prompt-hook flows over Memvid. |
rig-ballista |
none today | Ballista/Iceberg/DataFusion dependencies remain planned and commented out. |
The concrete multi-crate workflow tested today is the MCP loopback path: a rig_compose::ToolRegistry is exposed through rig_mcp::LoopbackTransport, remote schemas are wrapped as rig_mcp::McpTool, and the wrapped tools are registered back into another ToolRegistry. That proves a local rig-compose tool and an MCP-adapted tool are indistinguishable to callers. The backing test is mcp_tool_indistinguishable_from_local in rig-mcp/src/transport.rs.
License
Licensed under either Apache-2.0 or MIT, at your option.