typed-arrow-dyn 0.0.6

Dynamic Arrow facade for typed-arrow (runtime schema/builders).
Documentation

typed-arrow-dyn

typed-arrow-dyn is the runtime half of the typed-arrow story. Where the main crate gives you a fully compile-time schema, this crate builds Arrow arrays and RecordBatches from schemas that are only known at runtime.

What It Provides

  • DynSchema: a thin Arc<Schema> wrapper that feeds the unified SchemaLike trait.
  • DynBuilders: one builder per field, created directly from the runtime schema and monomorphized per Arrow logical type.
  • DynRow and DynCell: ergonomics for appending rows where every cell is either a value or None.
  • DynRowViews, DynRowView, DynCellRef, and friends (DynStructView, DynListView, …): zero-copy views over RecordBatch data with the same logical model as the owned builders.
  • DynProjection: reusable column/field projections that work for row iteration and hand the same Parquet projection mask to readers.
  • DynError / DynViewError: structured diagnostics for arity, type mismatches, builder failures, view errors, and deferred nullability violations.
  • validate_nullability: a post-build walk that enforces field and item nullability, returning precise paths such as person.address.street[].

Everything is designed to mirror the infallible typed path: builder allocation happens up front, appends stream through with minimal branching, and nullability is validated once via try_finish_into_batch.

Quick Start

use std::sync::Arc;

use arrow_schema::{DataType, Field, Schema, TimeUnit};
use typed_arrow_dyn::{DynBuilders, DynCell, DynRow, DynSchema, DynError};

fn build_batch() -> Result<arrow_array::RecordBatch, DynError> {
    let schema = Arc::new(Schema::new(vec![
        Field::new("id", DataType::Int64, false),
        Field::new("name", DataType::Utf8, true),
        Field::new(
            "events",
            DataType::List(Arc::new(Field::new(
                "item",
                DataType::Struct(vec![
                    Arc::new(Field::new("ts", DataType::Timestamp(TimeUnit::Millisecond, None), false)),
                    Arc::new(Field::new("payload", DataType::Utf8, true)),
                ].into()),
                true,
            ))),
            true,
        ),
    ]));

    let mut builders = DynBuilders::new(Arc::clone(&schema), 0);
    // row0: id=1, name="alice", events=[{ts: 10, payload: null}]
    builders.append_option_row(Some(DynRow(vec![
        Some(DynCell::I64(1)),
        Some(DynCell::Str("alice".into())),
        Some(DynCell::List(vec![Some(DynCell::Struct(vec![
            Some(DynCell::I64(10)),
            None,
        ]))])),
    ])))?;

    // row1: id=2, name=null, events=null
    builders.append_option_row(Some(DynRow(vec![
        Some(DynCell::I64(2)),
        None,
        None,
    ])))?;

    // Prefer the fallible finish: it validates nullability paths and surfaces Arrow errors.
    builders.try_finish_into_batch()
}

For ad hoc debugging you can still call finish_into_batch(), which will panic if Arrow rejects the arrays. Production code should stick with try_finish_into_batch() to get DynError::Nullability or DynError::Builder with context.

Zero-Copy Views & Projection

Sometimes you just need to read a RecordBatch with a runtime schema. The view module exposes zero-copy row iterators that give you borrowed DynCellRef<'_> handles and rich nested views:

use std::sync::Arc;

use arrow_schema::{DataType, Field, Schema};
use typed_arrow_dyn::{iter_batch_views, DynProjection, DynSchema, DynViewError};

fn inspect(batch: &arrow_array::RecordBatch) -> Result<(), DynViewError> {
    let dyn_schema = DynSchema::new(Arc::clone(batch.schema()));
    let projection = DynProjection::from_indices(batch.schema().as_ref(), [0, 2])?;

    for row in iter_batch_views(&dyn_schema, batch)? {
        let row = row?;                      // DynRowView<'_>
        let projected = row.project(&projection)?;
        if let Some(cell) = projected.get(0)? { // borrow without cloning
            if let Some(id) = cell.as_i64() {
                println!("id={id}");
            }
        }
        if let Some(list) = projected.get(1)?.and_then(|c| c.as_list()) {
            for idx in 0..list.len() {
                let entry = list.get(idx)?.and_then(|c| c.as_struct());
                // drill into DynStructView / DynMapView / DynUnionView as needed
            }
        }
    }
    Ok(())
}

Views cover all Arrow logical types via the same wrappers the owned path understands (DynStructView, DynListView, DynFixedSizeListView, DynMapView, DynUnionView). Every accessor returns DynViewError with the exact dotted/indexed path (orders[3].items[0].sku) so you can bubble rich diagnostics to callers.

DynProjection lets you describe projected schemas once and reuse them across readers:

  • DynProjection::from_schema(source, projection) matches by field name (including nested children) and records the selection paths.
  • DynProjection::from_indices(schema, indices) is a quick column slice.
  • DynRowView::project(&projection) lazily remaps columns without copying buffers.
  • DynProjection::project_row_view(&schema, batch, row) and iter_batch_views(...).project(projection) give you the same projected iterator.
  • DynProjection::to_parquet_mask() returns the parquet::arrow::ProjectionMask so Parquet readers only decode the needed leaf columns.

Borrowed values can be converted to owned DynCell instances via DynCellRef::to_owned() or DynCellRef::into_owned(), which makes it easy to bridge inspected data back into the dynamic builders if you need to rewrite batches.

Rows & Cells

  • DynRow(Vec<Option<DynCell>>) lines up with the schema width. Passing None at the top level appends a null to the entire column.
  • DynCell enumerates every value shape the factory understands: booleans, signed/unsigned integers, floating point, UTF-8 strings, binary blobs, dictionary payloads, and the nested variants:
    • Struct(Vec<Option<DynCell>>)—one entry per child field.
    • List(Vec<Option<DynCell>>), reused for List and LargeList.
    • FixedSizeList(Vec<Option<DynCell>>)—length must match the field’s declared width.
    • Map(Vec<(DynCell, Option<DynCell>)>)—each entry is a (key, value) pair; keys must be non-null and values obey the schema’s nullability.
    • Union { type_id, value }—selects a variant by Arrow tag; helpers DynCell::union_value(tag, cell) and DynCell::union_null(tag) keep construction tidy.
  • Dictionary columns accept the payload type (Str, Bin, or primitive variants); the key handling stays inside the builder.

DynRow::append_into_with_fields performs a lightweight type check before mutating builders, so arity/type mistakes fail fast without leaving partially-written columns.

Dynamic Builders

DynBuilders::new(schema, capacity) constructs one concrete builder per field by calling new_dyn_builder with the logical type. The factory is the only place that matches on arrow_schema::DataType; every builder is stored behind the DynColumnBuilder trait object with methods:

trait DynColumnBuilder {
    fn data_type(&self) -> &DataType;
    fn append_null(&mut self);
    fn append_dyn(&mut self, value: DynCell) -> Result<(), DynError>;
    fn finish(&mut self) -> ArrayRef;
    fn try_finish(&mut self) -> Result<ArrayRef, DynError>;
}

High-level users rarely call the trait directly—the unified facade hands out DynBuilders and keeps the append API aligned with the typed path (append_option_row, append_rows, etc.).

Error Model

DynError keeps the dynamic path predictable without drowning you in variants. Appends return Result<(), DynError>, capturing whether the row shape matches the schema, the value fits the Arrow type, or the builder rejected the insert. Finishing returns Result<RecordBatch, DynError> and adds nullability validation so callers know exactly why Arrow construction failed—no panics, just structured context you can surface to users or logs.

Nullability Enforcement

Dynamic builders defer nullability checks until the batch is sealed. validate_nullability(schema, arrays, union_null_rows) walks the resulting arrays—using the provided union row metadata—and enforces:

  • Non-nullable columns have no null slots.
  • Struct children obey their own nullability only where the parent is valid.
  • List, LargeList, and FixedSizeList items respect child nullability.
  • Map columns reject null keys and enforce the value field’s nullability.
  • Dense and sparse union variants enforce their field nullability with precise row context.

Violations bubble up as DynError::Nullability with col, path, and index for precise diagnostics, allowing the unified facade to report user-friendly messages instead of panicking.

Integration Points

  • DynSchema satisfies typed_arrow_unified::SchemaLike for runtime cases, so you can switch between typed and dynamic implementations behind a single API.
  • DynBuilders implements the unified BuildersLike contract; typed builders use a zero-cost NoError, while dynamic builders return Result.
  • Lower-level consumers can call new_dyn_builder(data_type) to embed dynamic columns into custom pipelines without adopting the whole facade.

Supported Data Types

The factory builds the following Arrow logical types (Arrow RS v56):

  • Null, Boolean
  • Int8/16/32/64, UInt8/16/32/64, Float32/64
  • Date32/64, Timestamp (all units, optional timezone), Duration (all units), Time32 (Second/Millisecond), Time64 (Microsecond/Nanosecond)
  • Utf8, LargeUtf8, Binary, LargeBinary, FixedSizeBinary
  • Dictionary with the above strings/binary types or primitive values
  • Struct, List, LargeList, FixedSizeList (including nested combinations)
  • Map/OrderedMap (keys non-null, value nullability configurable)
  • Union (dense and sparse)

Unsupported types currently fall back to a NullBuilder. Extend new_dyn_builder as Arrow gains new logical types.

Examples & Tests

  • cargo run -p typed-arrow-dyn --example nested_struct_list shows nested structs and lists.
  • cargo test -p typed-arrow-dyn exercises dictionaries, deep nesting, and nullability validation.

License

This crate shares the repository license; see LICENSE.