Skip to main content

Crate vgi

Crate vgi 

Source
Expand description

Build native, single-binary DuckDB extensions in Rust — no C++, no linking against DuckDB.

vgi is the Rust SDK for writing VGI (Vector Gateway Interface) workers: the worker side of Query Farm’s DuckDB “Hyperfederation” extension. A worker is an ordinary Rust binary that DuckDB launches and talks to over Apache Arrow IPC. It exposes scalar / table / aggregate functions and whole catalogs (schemas, tables, views) that behave like native DuckDB objects — with no compiled C++ extension and no version coupling to a specific DuckDB build.

Workers built with this crate are byte-for-byte wire-compatible with the canonical Python implementation, so a Rust worker drops in behind the same ATTACH … (TYPE vgi). It is built on the vgi-rpc crate (wire protocol, RPC server, transports), uses stock arrow-rs 58.x, and has an MSRV of 1.86.

§Your first worker

A worker is a main() that registers functions on a Worker and calls Worker::run. This one exposes upper_case(varchar) -> varchar:

use std::sync::Arc;

use arrow_array::{cast::AsArray, ArrayRef, RecordBatch, StringArray};
use arrow_schema::DataType;
use vgi::{ArgSpec, FunctionMetadata, ProcessParams, ScalarFunction, Worker};
use vgi_rpc::{Result, RpcError};

/// `upper_case(s)` — uppercase a string column.
struct UpperCase;

impl ScalarFunction for UpperCase {
    fn name(&self) -> &str {
        "upper_case"
    }

    fn metadata(&self) -> FunctionMetadata {
        FunctionMetadata {
            description: "Convert string values to uppercase".into(),
            return_type: Some(DataType::Utf8),
            ..Default::default()
        }
    }

    fn argument_specs(&self) -> Vec<ArgSpec> {
        vec![ArgSpec::column("value", 0, "varchar", "String to uppercase")]
    }

    fn process(&self, params: &ProcessParams, batch: &RecordBatch) -> Result<RecordBatch> {
        let col = batch.column(0).as_string::<i32>();
        let upper: StringArray = col.iter().map(|v| v.map(str::to_uppercase)).collect();
        let out: ArrayRef = Arc::new(upper);
        RecordBatch::try_new(params.output_schema.clone(), vec![out])
            .map_err(|e| RpcError::runtime_error(e.to_string()))
    }
}

fn main() {
    let mut worker = Worker::new();
    worker.register_scalar(UpperCase);
    worker.run(); // serves stdio (default), --unix <path>, or --http
}

Build it (cargo build --release), then call it from a DuckDB engine that has the vgi extension — Query Farm’s Haybarn distribution ships it and starts with uvx haybarn-cli:

ATTACH 'demo' (TYPE vgi, LOCATION './target/release/my-worker');
SELECT demo.main.upper_case(name) FROM (VALUES ('alice'), ('bob')) t(name);
-- ALICE
-- BOB

§The function model

Implement one trait per function kind and register it on the Worker:

KindTraitUse case
ScalarScalarFunctionPer-row transforms (1 row in → 1 row out)
Tabletable_function::TableFunctionGenerate / scan rows (no row input)
Table-in-outtable_in_out::TableInOutFunctionStreaming row transforms (N in → M out)
Bufferingbuffering::TableBufferingFunctionSink → combine → source (aggregate-emit)
Aggregateaggregate::AggregateFunctionGrouped / window / streaming aggregates

Every trait shares the same bind/process vocabulary: ArgSpec declares the arguments, FunctionMetadata declares optimizer-facing properties, BindParams / BindResponse resolve the output schema at bind time, and ProcessParams carries per-call context (settings, secrets, pushdown hints) into the work method.

§Beyond functions

Worker::set_catalog exposes a full catalog — schemas, function-backed tables, views, and macros — with constraints, column statistics, time travel (AT), and secondary catalogs attachable by name (see catalog). Projection and filter pushdown, ORDER BY / TABLESAMPLE hints, custom settings, secrets, and bearer auth are handled for you.

§Transports

Worker::run selects a transport from argv: stdio (default), Unix socket (--unix <path>, the launcher contract), or HTTP (--http, Arrow-IPC over HTTP with AEAD-sealed stateless stream tokens and optional bearer auth). You rarely pass these yourself — DuckDB supplies the right flags when it launches your worker.

Re-exports§

pub use function::ArgSpec;
pub use function::BindParams;
pub use function::BindResponse;
pub use function::FunctionExample;
pub use function::FunctionMetadata;
pub use function::ProcessParams;
pub use function::ScalarFunction;
pub use worker::Worker;

Modules§

aggregate
Aggregate function model (update / combine / finalize).
arguments
Parsing of the arguments wire blob.
buffering
Table buffering (sink + source) function model.
catalog
Default read-only catalog: auto-generates SchemaInfo + FunctionInfo from the worker’s registered functions.
dispatch
The VGI dispatcher: owns the function registries + catalog identity and implements every RPC handler (bind, init, and the catalog discovery methods).
function
Core function model shared by all VGI function kinds.
ipc
Arrow IPC stream helpers for the binary-valued wire fields.
numeric
Numeric scalar helpers (port of Python _promote_for_addition + NumericDispatch).
overload
Function overload resolution.
partition
Partition-column support: mark schema fields as partition columns and compute the per-batch vgi_partition_values#b64 metadata (base64 of a 2-row min/max IPC batch) that the C++ extension reads to plan partitioned aggregates. Mirrors the canonical Python partition_field.
protocol
VGI wire protocol: DTOs, enum payloads, and RPC method registration.
pushdown
Filter pushdown: deserialize the pushdown_filters blob, evaluate it against a batch, and apply it.
secrets
Secret access for functions.
settings
DuckDB session settings passed to functions.
statistics
Column-statistics serialization: the sparse-union IPC batch DuckDB’s VGI extension reads to seed the optimizer. Mirrors the canonical Python serialize_column_statistics.
storage
Cross-process state storage for VGI workers.
table_function
Table (producer) function model: generate output batches without input.
table_in_out
Table-in-out function model: transform input batches to output batches.
transport
Worker transport selection: stdio (default), AF_UNIX (launcher), HTTP.
wire
Flat wire serialization for VGI protocol DTOs.
worker
The VGI worker: owns function registries (via Dispatcher), builds the RPC server, and drives transport selection from argv.