1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
// Copyright 2025, 2026 Query Farm LLC - https://query.farm
//! Build native, single-binary DuckDB extensions in Rust — no C++, no linking
//! against DuckDB.
//!
//! `vgi` is the Rust SDK for writing **VGI (Vector Gateway Interface) workers**:
//! the worker side of [Query Farm](https://query.farm)'s DuckDB
//! "Hyperfederation" extension. A *worker* is an ordinary Rust binary that
//! DuckDB launches and talks to over Apache Arrow IPC. It exposes scalar /
//! table / aggregate functions and whole catalogs (schemas, tables, views) that
//! behave like native DuckDB objects — with no compiled C++ extension and no
//! version coupling to a specific DuckDB build.
//!
//! Workers built with this crate are byte-for-byte wire-compatible with the
//! canonical Python implementation, so a Rust worker drops in behind the
//! same `ATTACH … (TYPE vgi)`. It is built on the
//! [`vgi-rpc`](https://docs.rs/vgi-rpc) crate (wire protocol, RPC server,
//! transports), uses stock `arrow-rs` 58.x, and has an MSRV of 1.86.
//!
//! # Your first worker
//!
//! A worker is a `main()` that registers functions on a [`Worker`] and calls
//! [`Worker::run`]. This one exposes `upper_case(varchar) -> varchar`:
//!
//! ```no_run
//! # #![allow(clippy::needless_doctest_main)]
//! use std::sync::Arc;
//!
//! use arrow_array::{cast::AsArray, ArrayRef, RecordBatch, StringArray};
//! use arrow_schema::DataType;
//! use vgi::{ArgSpec, FunctionMetadata, ProcessParams, ScalarFunction, Worker};
//! use vgi_rpc::{Result, RpcError};
//!
//! /// `upper_case(s)` — uppercase a string column.
//! struct UpperCase;
//!
//! impl ScalarFunction for UpperCase {
//! fn name(&self) -> &str {
//! "upper_case"
//! }
//!
//! fn metadata(&self) -> FunctionMetadata {
//! FunctionMetadata {
//! description: "Convert string values to uppercase".into(),
//! return_type: Some(DataType::Utf8),
//! ..Default::default()
//! }
//! }
//!
//! fn argument_specs(&self) -> Vec<ArgSpec> {
//! vec![ArgSpec::column("value", 0, "varchar", "String to uppercase")]
//! }
//!
//! fn process(&self, params: &ProcessParams, batch: &RecordBatch) -> Result<RecordBatch> {
//! let col = batch.column(0).as_string::<i32>();
//! let upper: StringArray = col.iter().map(|v| v.map(str::to_uppercase)).collect();
//! let out: ArrayRef = Arc::new(upper);
//! RecordBatch::try_new(params.output_schema.clone(), vec![out])
//! .map_err(|e| RpcError::runtime_error(e.to_string()))
//! }
//! }
//!
//! fn main() {
//! let mut worker = Worker::new();
//! worker.register_scalar(UpperCase);
//! worker.run(); // serves stdio (default), --unix <path>, or --http
//! }
//! ```
//!
//! Build it (`cargo build --release`), then call it from a DuckDB engine that
//! has the `vgi` extension — Query Farm's [Haybarn] distribution ships it and
//! starts with `uvx haybarn-cli`:
//!
//! ```sql
//! ATTACH 'demo' (TYPE vgi, LOCATION './target/release/my-worker');
//! SELECT demo.main.upper_case(name) FROM (VALUES ('alice'), ('bob')) t(name);
//! -- ALICE
//! -- BOB
//! ```
//!
//! [Haybarn]: https://github.com/Query-farm-haybarn/haybarn
//!
//! # The function model
//!
//! Implement one trait per function kind and register it on the [`Worker`]:
//!
//! | Kind | Trait | Use case |
//! |--------------|-------------------------------------------|-------------------------------------------|
//! | Scalar | [`ScalarFunction`] | Per-row transforms (1 row in → 1 row out) |
//! | Table | [`table_function::TableFunction`] | Generate / scan rows (no row input) |
//! | Table-in-out | [`table_in_out::TableInOutFunction`] | Streaming row transforms (N in → M out) |
//! | Buffering | [`buffering::TableBufferingFunction`] | Sink → combine → source (aggregate-emit) |
//! | Aggregate | [`aggregate::AggregateFunction`] | Grouped / window / streaming aggregates |
//!
//! Every trait shares the same bind/process vocabulary: [`ArgSpec`] declares the
//! arguments, [`FunctionMetadata`] declares optimizer-facing properties,
//! [`BindParams`] / [`BindResponse`] resolve the output schema at bind time, and
//! [`ProcessParams`] carries per-call context (settings, secrets, pushdown
//! hints) into the work method.
//!
//! # Beyond functions
//!
//! [`Worker::set_catalog`] exposes a full catalog — schemas, function-backed
//! tables, views, and macros — with constraints, column statistics, time travel
//! (`AT`), and secondary catalogs attachable by name (see [`catalog`]).
//! Projection and filter pushdown, `ORDER BY` / `TABLESAMPLE` hints, custom
//! settings, secrets, and bearer auth are handled for you.
//!
//! # Transports
//!
//! [`Worker::run`] selects a transport from `argv`: **stdio** (default),
//! **Unix socket** (`--unix <path>`, the launcher contract), or **HTTP**
//! (`--http`, Arrow-IPC over HTTP with AEAD-sealed stateless stream tokens and
//! optional bearer auth). You rarely pass these yourself — DuckDB supplies the
//! right flags when it launches your worker.
pub use ;
pub use Worker;