Skip to main content

Crate quiver

Crate quiver 

Source
Expand description

§quiver

Latest version Documentation unsafe forbidden MIT Apache

A zero-copy, strongly typed interface for Apache Arrow columns and record batches, for Rust’s arrow-rs.

§What

arrow-rs is to a large extent dynamically typed. For instance, you cannot know until runtime if an arrow::ListArray will contain strings or numbers, and whether or not the values in it can be null.

quiver provides strongly typed (and zero-copy) wrappers around these arrays, with compile-time guarantees that are checked only once, during the construction of the columns. For instance, quiver::Column<quiver::List<Utf8>> is a ListArray that is guaranteed to contain strings, with no nulls.

Additionally, quiver provides a proc-macro for easily converting a struct of many arrays to and from arrow RecordBatches (needs the derive feature to be enabled).

A struct marked with #[derive(Quiver)] can contain either dynamically typed arrow arrays (ArrayRef, ListArray, …) or strongly typed quiver types (or a mix of both!).

§Example

For a complete, compiling example, see example.rs.

use std::collections::BTreeMap;

use quiver::arrow::array::ArrayRef;
use quiver::{Column, DynColumn, List, Quiver, Utf8};

/// Important thing
#[derive(Quiver)]
struct Thing {
    /// Optional
    #[quiver(metadata)]
    pub metadata: BTreeMap<String, String>,

    /// Strongly typed: guaranteed to be Utf8, with no nulls
    pub name: Column<Utf8>,

    /// Strongly typed: a List<Utf8> where the items may be null
    pub tags: Column<List<Option<Utf8>>>,

    /// The column name defaults to the field name;
    /// override it when it isn't a valid Rust identifier:
    #[quiver(name = "special:kind")]
    pub kind: Column<Utf8>,

    /// Strongly typed values; the whole *column* may be missing
    pub dob: Option<Column<i64>>,

    /// A raw arrow array: any datatype, any nullability — dynamically typed
    pub comment: ArrayRef,

    /// Optional: other, dynamic columns
    #[quiver(extra_columns)]
    pub other_columns: Vec<DynColumn>,
}

// Proc-macro generates:
// * `impl TryFrom<RecordBatch> for Thing` (and `&RecordBatch`) - validates the schema,
//   then downcasts (zero-copy)
// * `impl TryFrom<Thing> for RecordBatch` - fails on column length mismatch
// * `fn from_record_batch()` and `fn into_record_batch()` - discoverable aliases for the above
// * `COLUMN_*` descriptor constants - single-column access without hard-coding names
// * `fn min_schema()`/`fn max_schema()` - when all columns are statically typed
// * `fn empty_record_batch()` - when, additionally, all columns are required (min == max)

Building columns from values is infallible:

use quiver::{Column, List, Utf8};

let names: Column<Utf8> = vec!["Alice", "Bob"].into();
let scores = Column::<List<i64>>::from_values([vec![1, 2], vec![3]]);
let maybe: Column<Option<f64>> = [Some(1.5), None].into_iter().collect();

Single columns can be extracted without parsing the whole batch — the derive generates a COLUMN_* descriptor per column, so no names are hard-coded:

use quiver::{Column, Quiver, Utf8};

#[derive(Quiver)]
struct Reading {
    sensor: Column<Utf8>,
}

let batch = Reading {
    sensor: vec!["kitchen".to_owned()].into(),
}
.into_record_batch()
.unwrap();

// Single-column extraction, fully typed:
let sensors = Reading::COLUMN_SENSOR.extract(&batch).unwrap();
assert_eq!(sensors.to_vec(), ["kitchen"]); // `to_vec()` returns owned values
assert_eq!(Reading::COLUMN_SENSOR.name, "sensor");

// Static schema + infallible empty batches
// (when all columns are statically typed and required):
let empty = Reading::empty_record_batch(); // all declared columns, zero rows
assert_eq!(empty.num_rows(), 0);

quiver::Column is also usable standalone, without the derive:

use std::sync::Arc;

use quiver::arrow::array::{ArrayRef, ListArray};
use quiver::arrow::datatypes::Int32Type;
use quiver::{Column, List, Utf8};

let dynamic_arrow_array: ArrayRef = Arc::new(ListArray::from_iter_primitive::<Int32Type, _, _>(
    vec![Some(vec![Some(1), Some(2)]), Some(vec![Some(3)])],
));

let column = Column::<List<Option<i32>>>::try_from(dynamic_arrow_array).unwrap();
for list in &column {
    for number in list {
        // `number` is an `Option<i32>`; validation already happened, up front
    }
}

§Quiver types vs. arrow types

A #[derive(Quiver)] field can hold its column either as a raw arrow array (e.g. StringArray, ListArray, ArrayRef) or as a strongly-typed quiver::Column<L>, where L is a logical type like List<Option<Utf8>>. Use quiver types for compile-time guarantees; use arrow types when you want things to be dynamic.

All column matching is done by name — column order never matters: parsing accepts any input column order, and encoding emits the columns in struct declaration order (with any extra_columns appended at the end), regardless of the order they had when parsed. The column name defaults to the field name; #[quiver(name = "special:kind")] overrides it, e.g. for column names that aren’t valid Rust identifiers.

What is checked when parsing a RecordBatch:

Raw arrow arrayquiver::Column<L>
DatatypeExact for flat arrays; parameterized arrays (ListArray, …) are downcast only — any inner typesStructural match, recursively (List<Utf8>List<i64>; inner field names/nullability flags/metadata are not compared — actual nulls are what matters)
NullabilityNot checkedNon-Option levels must be null-free, at every nesting depth
TimestampsUnit checked; the timezone must be None (TimestampNanosecondArray)Unit and timezone (Timestamp<Nanosecond, Utc>)
Element accessThe arrow APIs; manual downcasts for nested dataTyped, infallible, and zero-copy (&str, i64, item iterators)
CostNoneOne eager validation at the parse boundary; cheap (see below)

All validation happens once, when the record batch enters: after that, a Column<L> cannot be invalid (its fields are private and immutable), so element access never returns a Result.

The validation is cheap — the values themselves are never read. It compares datatypes (proportional to schema depth, not row count) and checks null counts, which arrow caches, so the cost is O(1) per nesting level. The one exception: when a non-Option nesting level (e.g. the items of a List<Utf8>) sits on an inner array that carries a null buffer, quiver counts only the nulls reachable through valid rows, which scans that validity bitmap — still independent of the value bytes.

Structs whose columns all have a statically-known datatype also get generated fn min_schema() (the required columns) and fn max_schema() (all declared columns, including optional ones). When additionally every column is required (min_schema() == max_schema()), an infallible fn empty_record_batch() is generated too — zero rows, every column present. Structs with optional (Option<…>) columns don’t get it: there would be no single obvious empty batch, and a round-trip would silently turn None into Some(empty).

More of the Column API:

  • Construction is infallible: from_values, From<Vec<T>>, FromIterator, from_nullable_values (for e.g. Option<&str>Option<String>), and Default (empty). The exceptions: building a Dictionary (key overflow) or Run (run-end overflow) column can fail, so those use try_from_values instead
  • Reading: value/get, iter() (borrowed), value_owned/iter_owned/to_vec (owned)
  • Bulk zero-copy reads: as_slice()&[f32], &[[u8; 16]], … — for primitive and fixed-size binary non-nullable columns
  • Per-column metadata: metadata()/with_metadata(), stored on the arrow Field when converting to/from a record batch. Statically known metadata can be declared: #[quiver(metadata("sorted" = "true"))] — stamped on encode (instance metadata wins on key conflicts), included in min_schema()/max_schema(), never validated on parse
  • Domain newtypes: newtype_datatype!(SensorName, Utf8) makes Column<SensorName> work, with all of the above; for foreign types (orphan rule), use the As adapter: Column<As<Ipv4Addr, u32>>
  • Interop: as_arrow()/into_arrow(), and quiver errors convert into arrow::error::ArrowError (as ExternalError), so ? works in functions returning arrow results

The supported logical types:

Logical type LArrow datatypeElement value
bool, i8i64, u8u64, f16f64The sameBy value
Utf8, LargeUtf8, Utf8ViewThe same&str
AnyUtf8any UTF-8 encoding above&str
FixedSizeBinary<N>FixedSizeBinary(N)&[u8; N]
Binary, LargeBinary, BinaryViewThe same&[u8]
AnyBinaryany binary encoding (incl. FixedSizeBinary)&[u8]
Date32, Date64Date32, Date64i32 days / i64 ms
Time32SecondTime64NanosecondTime32(…), Time64(…)i32 / i64
TimestampNanosecond<Utc>Timestamp(Nanosecond, UTC)i64
DurationMillisecondDuration(Millisecond)i64
Dictionary<i32, Utf8>Dictionary(Int32, Utf8)Transparent: &str
Run<i32, Utf8>RunEndEncoded(Int32, Utf8)Transparent: &str
List<L>, LargeList<L>List(…)/LargeList(…), recursivelyAn iterator over the items
ListView<L>, LargeListView<L>ListView(…)/LargeListView(…), recursivelyAn iterator over the items
FixedSizeList<f32, 3>FixedSizeList(Float32, 3)An iterator over the items
AnyList<L>any list encoding aboveAn iterator over the items
Map<K, V>Map(…), recursivelyAn iterator over (key, value) pairs
Option<L>Nullable at this levelOption<…>

§Semi-dynamic logical types

Arrow has five physically different ways to store the same logical thing — a column of lists of L: List, LargeList, ListView, LargeListView, and FixedSizeList. AnyList<L> is a quiver-only logical type (no single arrow datatype of its own) that accepts whichever of those a column happens to use and reads them all uniformly — handy when the encoding is decided at runtime (e.g. data from an external source).

use quiver::{AnyList, Column};

// `array` may be a List / LargeList / ListView / LargeListView / FixedSizeList:
let column = Column::<AnyList<i64>>::try_from(array).unwrap();
for list in &column {
    for _item in list { /* i64 */ }
}

Because it has no single arrow datatype, AnyList is parse-only: it implements LogicalType (so try_from/reading work) but not ConcreteType, so it has no datatype(), from_values, Default, or schema generation. To build a column, pick a concrete encoding such as Column<List<L>>.

AnyBinary is the same idea for byte strings: it accepts any of Binary, LargeBinary, BinaryView, or FixedSizeBinary (any size) and reads them all as &[u8]. AnyUtf8 likewise accepts any of Utf8, LargeUtf8, or Utf8View and reads them as &str. Both are also parse-only.

§What is not supported

These datatypes have no logical type yet, so there is no Column<L> for them:

  • Struct — but usable as a raw, downcast-only arrow field (StructArray). (Parked; investigated 2026-06-04 — moderate effort: a new derive generating per-row view/owned/typed mirror structs; the LogicalType trait needs no changes. The one subtle part is hierarchical null masking: when a struct row is null, arrow leaves the child values undefined, so child null-validation must be masked by the parent validity, on both parse and build.)
  • Decimal (Decimal32/Decimal64/Decimal128/Decimal256)
  • Interval (IntervalDayTime/IntervalMonthDayNano/IntervalYearMonth)
  • Union

Everything except Struct is rejected with a clear compile error even as a raw arrow field; Struct is the one that still works as a raw downcast-only field.

Timezones are matched as exact strings: Timestamp<Nanosecond, Utc> (“UTC”) will not accept an array with the equivalent timezone "+00:00".

§Pros & cons

Pros:

  • Zero-copy: columns stay as reference-counted Arrow arrays (structure-of-arrays), never transposed into Vec<RowStruct>
  • Parse, don’t validate: column names, datatypes, and nullability are all checked once, eagerly, at the TryFrom<RecordBatch> boundary
  • Strong typing on demand: quiver::Column<L> validates exact datatypes (including the inner types of nested arrays) and nullability, then gives infallible typed access; raw arrow types remain available when you want dynamic
  • Struct literal = builder: plain pub fields; no builder machinery, free pattern matching
  • Nothing is hidden: record batch metadata and unknown columns are explicit fields, declared in the struct
  • Thin: the derive expands to plain arrow-rs calls; no runtime machinery

Cons:

  • Invalid states are representable: a column length mismatch is only caught when converting to a RecordBatch, possibly far from the mistake site
  • Fields stay mutable: a parsed struct can be modified into invalidity after validation (quiver::Column itself stays valid — it is immutable after construction)
  • Raw arrow fields are unchecked by design: nullability and the inner types of nested arrays are only validated for quiver::Column<L> fields
  • Column order is not preserved: matching is by name; re-encoding emits struct declaration order, with extra_columns appended at the end — not the input order
  • No per-row view: data is accessed column-wise (that’s the point), but there is no generated row iterator
  • Rust only: no IDL, no cross-language codegen (so far)

§Prior art (researched 2026-06-04)

§Rust
CrateStatusWhat it doesZero-copy SoA?
typed-arrowActive (tonbo-io)#[derive(Record)] on logical types → builders, schema, lazy row viewsYes (views feature)
arrow_convertActiveserde-style derive, Rust types ↔ Arrow arraysNo — transposes + copies into Vec<T>
serde_arrowVery activeVec<Struct>RecordBatch via serdeNo — serde data model forces owned values

typed-arrow is the closest match but misses the mold:

  1. Positional column matching (index + datatype), not name-based. No optional columns, no other_columns.
  2. Nullability validated lazily per-row, not eagerly at the parse boundary.
  3. No metadata schema validation at all.
  4. Schema declared as Rust logical types (String, i64); generates builder machinery we don’t need. Our derive goes directly on array types (StringArray) — simpler, inherently zero-copy.

§Crates

  • quiver — the runtime crate: Column<L>, DynColumn, Error, and the arrow re-export
  • quiver_derive — the #[derive(Quiver)] proc-macro

§License

Dual-licensed under MIT and Apache 2.0.

§Status

Ready for production.

⚠️ Most of the code in this repository was generated by an LLM (under human direction and review). Read it with the appropriate skepticism.

§Future work

§#[quiver(flatten)]

struct composition (parked; evaluated 2026-06-04: feasible, no stable-Rust blockers, ~2–3 sessions — the biggest derive feature so far). Spec highlights: a doc-hidden QuiverRecord trait (COLUMN_NAMES, partial_from_record_batch, push_columns) that the existing generated fns become wrappers over; flattened columns at the flatten field’s position; outer owns strictness; const-assert that the inner has no extra_columns/metadata field; compile-time name-collision detection via const eval. One spec amendment needed: min_fields/max_fields must live in a separate trait implemented only for statically-typed structs (a mandatory method would force a lying impl or runtime panic when flattening a dynamic inner). First step when picked up: the QuiverRecord refactor, which is independently valuable.

§Feature flags

  • derive (enabled by default) — Enable the #[derive(Quiver)] proc-macro.
  • document-features — Embed a list of all feature flags in the crate documentation, using the document-features crate.

Modules§

arrow
A complete, safe, native Rust implementation of Apache Arrow, a cross-language development platform for in-memory data.
half
A crate that provides support for half-precision 16-bit floating point types.

Macros§

newtype_datatype
Implements LogicalType for a domain newtype, making Column<MyType> work — including nesting (List<MyType>), the convenience constructors, and the derive.

Structs§

AnyBinary
Marker for a binary column in any of arrow’s byte-string encodings.
AnyList
Marker for a list column in any of arrow’s list encodings.
AnyUtf8
Marker for a UTF-8 column in any of arrow’s string encodings.
As
Adapter for using a foreign type (one you don’t own, so newtype_datatype! is off-limits by the orphan rule) as a logical column type, stored as Repr:
Binary
Marker for an arrow Binary column: variable-length byte strings.
BinaryView
Marker for an arrow BinaryView column: like Binary, in the newer “view” encoding (arrow::array::BinaryViewArray), optimized for comparisons and out-of-order writes.
Column
A strongly-typed, validated, zero-copy view of one record batch column.
ColumnDesc
Identifies a strongly-typed column by name.
ColumnIntoIter
By-value iterator over the owned values of a Column.
ColumnIter
Iterator over the values of a Column.
Date32
Days since the Unix epoch, as an i32.
Date64
Milliseconds since the Unix epoch, as an i64 (expected to be a multiple of a day; not validated).
Dictionary
Marker for an arrow Dictionary column, e.g. Dictionary<i32, Utf8>.
Duration
Marker for an arrow Duration column, e.g. Duration<Nanosecond>.
DynColumn
A single dynamically-typed column of a record batch: the field description plus the actual data.
DynColumnDesc
Identifies a dynamically-typed column by name.
Error
An error from converting between a record batch and a #[derive(Quiver)] struct.
FixedSizeBinary
Marker for an arrow FixedSizeBinary(N) column, e.g. FixedSizeBinary<16> for UUIDs.
FixedSizeList
Marker for an arrow FixedSizeList column: each element holds exactly N items of logical type L, e.g. FixedSizeList<f32, 3> for 3D positions.
LargeBinary
Marker for an arrow LargeBinary column: like Binary, with 64-bit offsets.
LargeList
Marker for an arrow LargeList column with items of logical type L: like List, with 64-bit offsets.
LargeListView
Marker for an arrow LargeListView column with items of logical type L: like ListView, with 64-bit offsets and sizes.
LargeUtf8
Like Utf8, with 64-bit offsets (for single columns holding more than 2 GiB of text in total).
List
Marker for an arrow List column with items of logical type L.
ListValue
One list element of a list column (List, LargeList, FixedSizeList, …): a zero-copy, random-access view of that row’s typed items.
ListView
Marker for an arrow ListView column with items of logical type L: like List, in the list-view layout (per-row offset + size).
Map
Marker for an arrow Map column from keys K to values V, e.g. Map<Utf8, i64>.
MapValue
One map element of a Column<Map<K, V>>: an iterator over the typed (key, value) pairs.
Microsecond
Millisecond
Nanosecond
NoTimezone
Timezone-naive timestamps.
Run
Marker for an arrow run-end-encoded column, e.g. Run<i32, Utf8>.
Second
Time32Millisecond
Milliseconds since midnight, as an i32.
Time32Second
Seconds since midnight, as an i32.
Time64Microsecond
Microseconds since midnight, as an i64.
Time64Nanosecond
Nanoseconds since midnight, as an i64.
Timestamp
Marker for an arrow Timestamp column, e.g. Timestamp<Nanosecond, Utc>.
TypedDictionary
The validated representation of a Dictionary column: the dictionary array plus its downcast values.
TypedFixedSizeList
The validated representation of a FixedSizeList column: the list array plus its downcast values.
TypedLargeList
The validated representation of a LargeList column: the list array plus its downcast values.
TypedLargeListView
The validated representation of a LargeListView column: the list-view array plus its downcast values.
TypedList
The validated representation of a List column: the list array plus its downcast values.
TypedListView
The validated representation of a ListView column: the list-view array plus its downcast values.
TypedMap
The validated representation of a Map column: the map array plus its downcast keys and values.
TypedRun
The validated representation of a Run column: the run array plus its downcast values.
Utc
The “UTC” timezone.
Utf8
UTF-8 text: an arrow DataType::Utf8 column with String values.
Utf8View
Like Utf8, in the newer “view” encoding (arrow::array::StringViewArray), optimized for comparisons and out-of-order writes.

Enums§

AnyTypedBinary
The validated representation of an AnyBinary column: one of the per-encoding binary arrays.
AnyTypedList
The validated representation of an AnyList column: one of the per-encoding typed representations.
AnyTypedUtf8
The validated representation of an AnyUtf8 column: one of the per-encoding string arrays.
ColumnError
What can go wrong when constructing a Column.
ErrorKind
What went wrong when converting between a record batch and a #[derive(Quiver)] struct.

Traits§

ConcreteType
A LogicalType that corresponds to a single concrete arrow datatype, and can therefore be built and used to generate schemas.
DictionaryKey
A logical type usable as a Dictionary key: i8i64, u8u64.
InfallibleBuild
Marker for logical types whose ConcreteType::build can never fail, making the convenient Column::from_values (and From<Vec<T>>, FromIterator) available.
LogicalType
A logical column type, e.g. Utf8, Option<i64>, or List<Utf8>.
PrimitiveType
Logical types whose values are stored in one contiguous buffer of primitive values, enabling the zero-copy Column::as_slice.
RefType
Logical types whose element values can be borrowed as plain references (&str, &i64, …), enabling column[index] (see Column’s Index impl).
RunEndType
A logical type usable as a Run end-index: i16, i32, or i64.
TimeUnitSpec
A Timestamp/Duration time unit: Second, Millisecond, Microsecond, or Nanosecond.
TimezoneSpec
The timezone of a Timestamp: NoTimezone, Utc, or your own marker type.

Type Aliases§

DurationMicrosecond
DurationMillisecond
DurationNanosecond
DurationSecond
TimestampMicrosecond
TimestampMillisecond
TimestampNanosecond
TimestampSecond

Derive Macros§

Quiver
Derives conversions between a struct of typed Arrow arrays and a RecordBatch.