vgi
A VGI worker is a small Rust program that DuckDB talks to over Apache Arrow IPC. It can expose scalar / table / aggregate functions and whole catalogs (schemas, tables, views) that behave like native DuckDB objects. DuckDB launches your worker for you when a query needs it — you never run a server by hand.
vgi is the Rust SDK for building those workers. It is byte-for-byte
wire-compatible with the canonical
Python SDK, so a Rust worker
drops in behind the same ATTACH ... (TYPE vgi). Built on
vgi-rpc; stock arrow-rs 58.x, MSRV 1.86.
Why a worker instead of a C++ extension?
| Traditional DuckDB extension | VGI worker |
|---|---|
| Written in C/C++, compiled and linked against DuckDB | Written in Rust, one standalone binary |
| Must be rebuilt for each DuckDB version | Version independent |
| Complex build / signing / release cycle | cargo build, ship the binary |
| Runs in-process | Process isolation |
Reach for it when you want to: call REST APIs from SQL, run ML inference, expose an external database / API / filesystem as a queryable catalog, or ship domain-specific functions to your team as a single binary.
Your first worker
1. Create a project and add the dependencies (these are exactly what the example below needs):
# Cargo.toml
[]
= "0.1"
= "0.2"
= "58"
= "58"
2. Write a function and serve it:
// src/main.rs
use Arc;
use ;
use DataType;
use ;
use ;
/// `upper_case(s)` — uppercase a string column.
;
3. Build it:
4. Call it from a DuckDB engine that has the vgi extension. The vgi
extension currently ships with Query Farm's
Haybarn DuckDB distribution,
which starts with no install via uvx haybarn-cli. From your project directory:
-- Haybarn ships the `vgi` extension. DuckDB LAUNCHES the worker for you;
-- LOCATION is the command it runs, and the alias 'demo' is what you
-- qualify functions with in SQL.
ATTACH 'demo' (TYPE vgi, LOCATION './target/release/my-worker');
SELECT demo.main.upper_case(name) FROM (VALUES ('alice'), ('bob')) t(name);
-- ALICE
-- BOB
-- Or drop the prefix:
USE demo;
SELECT main.upper_case('hello'); -- HELLO
LOCATIONgotcha: the path is resolved relative to the DuckDB process's working directory, not your project. If the worker isn't found, use an absolute path (e.g.LOCATION '/abs/path/to/target/release/my-worker').
That's it — a native-speed SQL function, shipped as one static binary, with no extension to compile.
Iterating
Change your Rust, rebuild, and re-attach. DuckDB pools the worker process per
attachment, so the reliable way to pick up a new build is to re-ATTACH (or start
a fresh session):
DETACH demo;
ATTACH 'demo' (TYPE vgi, LOCATION './target/release/my-worker');
Troubleshooting
ATTACHcan't find the worker —LOCATIONis resolved relative to DuckDB's working directory, not your project. Use an absolute path.Catalog Error: ... upper_case does not exist— qualify with the attach alias (demo.main.upper_case) or runUSE demo;first.- A runtime error in your function — anything you return as
RpcError(or any panic) surfaces in DuckDB's error message; return descriptive errors fromprocessto make debugging easy. - Type mismatch at the call site —
argument_specsis validated at bind time, so a wrong-typed column fails fast with a clear message before any rows flow.
Function types
Register any mix of these via the typed traits in vgi:
| Type | Trait | SQL pattern | Use case |
|---|---|---|---|
| Scalar | ScalarFunction |
SELECT f(col) FROM t |
Per-row transforms (1:1) |
| Table | TableFunction |
SELECT * FROM f(args) |
Generate / scan data |
| Table-In-Out | TableInOutFunction |
SELECT * FROM f((SELECT …)) |
Streaming transforms |
| Table-Buffering | TableBufferingFunction |
SELECT * FROM f((SELECT …)) |
Aggregate-then-emit (sink → combine → source) |
| Aggregate | AggregateFunction |
SELECT f(col) … GROUP BY … |
Grouped / window / streaming aggregates |
Each trait is small: name, metadata, argument_specs, an on_bind to resolve
the output schema, and process (or the buffering / aggregate lifecycle methods).
Projection & filter pushdown, ORDER BY / TABLESAMPLE hints, settings, secrets
(two-phase bind), bearer auth, and a cross-process state store are handled for you.
Beyond functions: full catalogs
Worker::set_catalog exposes a complete catalog — schemas, function-backed
tables, views, and macros — with constraints, column statistics, time
travel (AT), and secondary catalogs attachable by name:
ATTACH 'external_db' (TYPE vgi, LOCATION './my-catalog-worker');
SELECT * FROM external_db.main.users; -- a function-backed table
SELECT * FROM external_db.analytics.daily_view; -- a view
SELECT external_db.main.transform(col) FROM t; -- a function
A worker can act as a bridge — databases, APIs, filesystems — presented to DuckDB as native catalogs.
Transports
Worker::run picks the transport from argv:
- stdio (default) — DuckDB spawns the worker per query. Nothing to configure.
- Unix socket (
--unix <path>) — one long-lived worker (the launcher contract). - HTTP (
--http) — Arrow-IPC over HTTP with AEAD-sealed stateless stream tokens and optional bearer auth.
Where to go next
- API docs (docs.rs) — every trait and type.
- Example worker — a fixture worker exercising every function kind and full catalogs.
License
Query Farm Source-Available License v1.0 — see LICENSE. Copyright © 2025, 2026 Query Farm LLC.