Expand description
Build native, single-binary DuckDB extensions in Rust — no C++, no linking against DuckDB.
vgi is the Rust SDK for writing VGI (Vector Gateway Interface) workers:
the worker side of Query Farm’s DuckDB
“Hyperfederation” extension. A worker is an ordinary Rust binary that
DuckDB launches and talks to over Apache Arrow IPC. It exposes scalar /
table / aggregate functions and whole catalogs (schemas, tables, views) that
behave like native DuckDB objects — with no compiled C++ extension and no
version coupling to a specific DuckDB build.
Workers built with this crate are byte-for-byte wire-compatible with the
canonical Python implementation, so a Rust worker drops in behind the
same ATTACH … (TYPE vgi). It is built on the
vgi-rpc crate (wire protocol, RPC server,
transports), uses stock arrow-rs 58.x, and has an MSRV of 1.86.
§Your first worker
A worker is a main() that registers functions on a Worker and calls
Worker::run. This one exposes upper_case(varchar) -> varchar:
use std::sync::Arc;
use arrow_array::{cast::AsArray, ArrayRef, RecordBatch, StringArray};
use arrow_schema::DataType;
use vgi::{ArgSpec, FunctionMetadata, ProcessParams, ScalarFunction, Worker};
use vgi_rpc::{Result, RpcError};
/// `upper_case(s)` — uppercase a string column.
struct UpperCase;
impl ScalarFunction for UpperCase {
fn name(&self) -> &str {
"upper_case"
}
fn metadata(&self) -> FunctionMetadata {
FunctionMetadata {
description: "Convert string values to uppercase".into(),
return_type: Some(DataType::Utf8),
..Default::default()
}
}
fn argument_specs(&self) -> Vec<ArgSpec> {
vec![ArgSpec::column("value", 0, "varchar", "String to uppercase")]
}
fn process(&self, params: &ProcessParams, batch: &RecordBatch) -> Result<RecordBatch> {
let col = batch.column(0).as_string::<i32>();
let upper: StringArray = col.iter().map(|v| v.map(str::to_uppercase)).collect();
let out: ArrayRef = Arc::new(upper);
RecordBatch::try_new(params.output_schema.clone(), vec![out])
.map_err(|e| RpcError::runtime_error(e.to_string()))
}
}
fn main() {
let mut worker = Worker::new();
worker.register_scalar(UpperCase);
worker.run(); // serves stdio (default), --unix <path>, or --http
}Build it (cargo build --release), then call it from a DuckDB engine that
has the vgi extension — Query Farm’s Haybarn distribution ships it and
starts with uvx haybarn-cli:
ATTACH 'demo' (TYPE vgi, LOCATION './target/release/my-worker');
SELECT demo.main.upper_case(name) FROM (VALUES ('alice'), ('bob')) t(name);
-- ALICE
-- BOB§The function model
Implement one trait per function kind and register it on the Worker:
| Kind | Trait | Use case |
|---|---|---|
| Scalar | ScalarFunction | Per-row transforms (1 row in → 1 row out) |
| Table | table_function::TableFunction | Generate / scan rows (no row input) |
| Table-in-out | table_in_out::TableInOutFunction | Streaming row transforms (N in → M out) |
| Buffering | buffering::TableBufferingFunction | Sink → combine → source (aggregate-emit) |
| Aggregate | aggregate::AggregateFunction | Grouped / window / streaming aggregates |
Every trait shares the same bind/process vocabulary: ArgSpec declares the
arguments, FunctionMetadata declares optimizer-facing properties,
BindParams / BindResponse resolve the output schema at bind time, and
ProcessParams carries per-call context (settings, secrets, pushdown
hints) into the work method.
§Beyond functions
Worker::set_catalog exposes a full catalog — schemas, function-backed
tables, views, and macros — with constraints, column statistics, time travel
(AT), and secondary catalogs attachable by name (see catalog).
Projection and filter pushdown, ORDER BY / TABLESAMPLE hints, custom
settings, secrets, and bearer auth are handled for you.
§Transports
Worker::run selects a transport from argv: stdio (default),
Unix socket (--unix <path>, the launcher contract), or HTTP
(--http, Arrow-IPC over HTTP with AEAD-sealed stateless stream tokens and
optional bearer auth). You rarely pass these yourself — DuckDB supplies the
right flags when it launches your worker.
Re-exports§
pub use function::ArgSpec;pub use function::BindParams;pub use function::BindResponse;pub use function::FunctionExample;pub use function::FunctionMetadata;pub use function::ProcessParams;pub use function::ScalarFunction;pub use worker::Worker;
Modules§
- aggregate
- Aggregate function model (update / combine / finalize).
- arguments
- Parsing of the
argumentswire blob. - buffering
- Table buffering (sink + source) function model.
- catalog
- Default read-only catalog: auto-generates
SchemaInfo+FunctionInfofrom the worker’s registered functions. - dispatch
- The VGI dispatcher: owns the function registries + catalog identity and implements every RPC handler (bind, init, and the catalog discovery methods).
- function
- Core function model shared by all VGI function kinds.
- ipc
- Arrow IPC stream helpers for the binary-valued wire fields.
- numeric
- Numeric scalar helpers (port of Python
_promote_for_addition+NumericDispatch). - overload
- Function overload resolution.
- partition
- Partition-column support: mark schema fields as partition columns and
compute the per-batch
vgi_partition_values#b64metadata (base64 of a 2-row min/max IPC batch) that the C++ extension reads to plan partitioned aggregates. Mirrors the canonical Pythonpartition_field. - protocol
- VGI wire protocol: DTOs, enum payloads, and RPC method registration.
- pushdown
- Filter pushdown: deserialize the
pushdown_filtersblob, evaluate it against a batch, and apply it. - secrets
- Secret access for functions.
- settings
- DuckDB session settings passed to functions.
- statistics
- Column-statistics serialization: the sparse-union IPC batch DuckDB’s VGI
extension reads to seed the optimizer. Mirrors the canonical Python
serialize_column_statistics. - storage
- Cross-process state storage for VGI workers.
- table_
function - Table (producer) function model: generate output batches without input.
- table_
in_ out - Table-in-out function model: transform input batches to output batches.
- transport
- Worker transport selection: stdio (default), AF_UNIX (launcher), HTTP.
- wire
- Flat wire serialization for VGI protocol DTOs.
- worker
- The VGI worker: owns function registries (via
Dispatcher), builds the RPC server, and drives transport selection from argv.