typed-arrow-dyn 0.0.6

Dynamic Arrow facade for typed-arrow (runtime schema/builders).
Documentation
# typed-arrow-dyn

`typed-arrow-dyn` is the runtime half of the typed-arrow story. Where the main crate gives you a fully compile-time schema, this crate builds Arrow arrays and `RecordBatch`es from schemas that are only known at runtime.

## What It Provides
- `DynSchema`: a thin `Arc<Schema>` wrapper that feeds the unified `SchemaLike` trait.
- `DynBuilders`: one builder per field, created directly from the runtime schema and monomorphized per Arrow logical type.
- `DynRow` and `DynCell`: ergonomics for appending rows where every cell is either a value or `None`.
- `DynRowViews`, `DynRowView`, `DynCellRef`, and friends (`DynStructView`, `DynListView`, …): zero-copy views over `RecordBatch` data with the same logical model as the owned builders.
- `DynProjection`: reusable column/field projections that work for row iteration *and* hand the same Parquet projection mask to readers.
- `DynError` / `DynViewError`: structured diagnostics for arity, type mismatches, builder failures, view errors, and deferred nullability violations.
- `validate_nullability`: a post-build walk that enforces field and item nullability, returning precise paths such as `person.address.street[]`.

Everything is designed to mirror the infallible typed path: builder allocation happens up front, appends stream through with minimal branching, and nullability is validated once via `try_finish_into_batch`.

## Quick Start

```rust
use std::sync::Arc;

use arrow_schema::{DataType, Field, Schema, TimeUnit};
use typed_arrow_dyn::{DynBuilders, DynCell, DynRow, DynSchema, DynError};

fn build_batch() -> Result<arrow_array::RecordBatch, DynError> {
    let schema = Arc::new(Schema::new(vec![
        Field::new("id", DataType::Int64, false),
        Field::new("name", DataType::Utf8, true),
        Field::new(
            "events",
            DataType::List(Arc::new(Field::new(
                "item",
                DataType::Struct(vec![
                    Arc::new(Field::new("ts", DataType::Timestamp(TimeUnit::Millisecond, None), false)),
                    Arc::new(Field::new("payload", DataType::Utf8, true)),
                ].into()),
                true,
            ))),
            true,
        ),
    ]));

    let mut builders = DynBuilders::new(Arc::clone(&schema), 0);
    // row0: id=1, name="alice", events=[{ts: 10, payload: null}]
    builders.append_option_row(Some(DynRow(vec![
        Some(DynCell::I64(1)),
        Some(DynCell::Str("alice".into())),
        Some(DynCell::List(vec![Some(DynCell::Struct(vec![
            Some(DynCell::I64(10)),
            None,
        ]))])),
    ])))?;

    // row1: id=2, name=null, events=null
    builders.append_option_row(Some(DynRow(vec![
        Some(DynCell::I64(2)),
        None,
        None,
    ])))?;

    // Prefer the fallible finish: it validates nullability paths and surfaces Arrow errors.
    builders.try_finish_into_batch()
}
```

For ad hoc debugging you can still call `finish_into_batch()`, which will panic if Arrow rejects the arrays. Production code should stick with `try_finish_into_batch()` to get `DynError::Nullability` or `DynError::Builder` with context.

## Zero-Copy Views & Projection

Sometimes you just need to *read* a `RecordBatch` with a runtime schema. The `view` module exposes zero-copy row iterators that give you borrowed `DynCellRef<'_>` handles and rich nested views:

```rust
use std::sync::Arc;

use arrow_schema::{DataType, Field, Schema};
use typed_arrow_dyn::{iter_batch_views, DynProjection, DynSchema, DynViewError};

fn inspect(batch: &arrow_array::RecordBatch) -> Result<(), DynViewError> {
    let dyn_schema = DynSchema::new(Arc::clone(batch.schema()));
    let projection = DynProjection::from_indices(batch.schema().as_ref(), [0, 2])?;

    for row in iter_batch_views(&dyn_schema, batch)? {
        let row = row?;                      // DynRowView<'_>
        let projected = row.project(&projection)?;
        if let Some(cell) = projected.get(0)? { // borrow without cloning
            if let Some(id) = cell.as_i64() {
                println!("id={id}");
            }
        }
        if let Some(list) = projected.get(1)?.and_then(|c| c.as_list()) {
            for idx in 0..list.len() {
                let entry = list.get(idx)?.and_then(|c| c.as_struct());
                // drill into DynStructView / DynMapView / DynUnionView as needed
            }
        }
    }
    Ok(())
}
```

Views cover all Arrow logical types via the same wrappers the owned path understands (`DynStructView`, `DynListView`, `DynFixedSizeListView`, `DynMapView`, `DynUnionView`). Every accessor returns `DynViewError` with the exact dotted/indexed path (`orders[3].items[0].sku`) so you can bubble rich diagnostics to callers.

`DynProjection` lets you describe projected schemas once and reuse them across readers:

- `DynProjection::from_schema(source, projection)` matches by field name (including nested children) and records the selection paths.
- `DynProjection::from_indices(schema, indices)` is a quick column slice.
- `DynRowView::project(&projection)` lazily remaps columns without copying buffers.
- `DynProjection::project_row_view(&schema, batch, row)` and `iter_batch_views(...).project(projection)` give you the same projected iterator.
- `DynProjection::to_parquet_mask()` returns the `parquet::arrow::ProjectionMask` so Parquet readers only decode the needed leaf columns.

Borrowed values can be converted to owned `DynCell` instances via `DynCellRef::to_owned()` or `DynCellRef::into_owned()`, which makes it easy to bridge inspected data back into the dynamic builders if you need to rewrite batches.

## Rows & Cells
- `DynRow(Vec<Option<DynCell>>)` lines up with the schema width. Passing `None` at the top level appends a null to the entire column.
- `DynCell` enumerates every value shape the factory understands: booleans, signed/unsigned integers, floating point, UTF-8 strings, binary blobs, dictionary payloads, and the nested variants:
  - `Struct(Vec<Option<DynCell>>)`—one entry per child field.
  - `List(Vec<Option<DynCell>>)`, reused for `List` and `LargeList`.
  - `FixedSizeList(Vec<Option<DynCell>>)`—length must match the field’s declared width.
  - `Map(Vec<(DynCell, Option<DynCell>)>)`—each entry is a `(key, value)` pair; keys must be non-null and values obey the schema’s nullability.
  - `Union { type_id, value }`—selects a variant by Arrow tag; helpers `DynCell::union_value(tag, cell)` and `DynCell::union_null(tag)` keep construction tidy.
- Dictionary columns accept the payload type (`Str`, `Bin`, or primitive variants); the key handling stays inside the builder.

`DynRow::append_into_with_fields` performs a lightweight type check before mutating builders, so arity/type mistakes fail fast without leaving partially-written columns.

## Dynamic Builders
`DynBuilders::new(schema, capacity)` constructs one concrete builder per field by calling [`new_dyn_builder`](src/factory.rs) with the logical type. The factory is the only place that matches on `arrow_schema::DataType`; every builder is stored behind the `DynColumnBuilder` trait object with methods:

```rust
trait DynColumnBuilder {
    fn data_type(&self) -> &DataType;
    fn append_null(&mut self);
    fn append_dyn(&mut self, value: DynCell) -> Result<(), DynError>;
    fn finish(&mut self) -> ArrayRef;
    fn try_finish(&mut self) -> Result<ArrayRef, DynError>;
}
```

High-level users rarely call the trait directly—the unified facade hands out `DynBuilders` and keeps the append API aligned with the typed path (`append_option_row`, `append_rows`, etc.).

## Error Model

`DynError` keeps the dynamic path predictable without drowning you in variants. Appends return `Result<(), DynError>`, capturing whether the row shape matches the schema, the value fits the Arrow type, or the builder rejected the insert. Finishing returns `Result<RecordBatch, DynError>` and adds nullability validation so callers know exactly why Arrow construction failed—no panics, just structured context you can surface to users or logs.

## Nullability Enforcement

Dynamic builders defer nullability checks until the batch is sealed. `validate_nullability(schema, arrays, union_null_rows)` walks the resulting arrays—using the provided union row metadata—and enforces:

- Non-nullable columns have no null slots.
- Struct children obey their own nullability only where the parent is valid.
- List, LargeList, and FixedSizeList items respect child nullability.
- Map columns reject null keys and enforce the value field’s nullability.
- Dense and sparse union variants enforce their field nullability with precise row context.

Violations bubble up as `DynError::Nullability` with `col`, `path`, and `index` for precise diagnostics, allowing the unified facade to report user-friendly messages instead of panicking.

## Integration Points

- `DynSchema` satisfies `typed_arrow_unified::SchemaLike` for runtime cases, so you can switch between typed and dynamic implementations behind a single API.
- `DynBuilders` implements the unified `BuildersLike` contract; typed builders use a zero-cost `NoError`, while dynamic builders return `Result`.
- Lower-level consumers can call `new_dyn_builder(data_type)` to embed dynamic columns into custom pipelines without adopting the whole facade.

## Supported Data Types

The factory builds the following Arrow logical types (Arrow RS v56):

- Null, Boolean
- Int8/16/32/64, UInt8/16/32/64, Float32/64
- Date32/64, Timestamp (all units, optional timezone), Duration (all units), Time32 (Second/Millisecond), Time64 (Microsecond/Nanosecond)
- Utf8, LargeUtf8, Binary, LargeBinary, FixedSizeBinary
- Dictionary with the above strings/binary types or primitive values
- Struct, List, LargeList, FixedSizeList (including nested combinations)
- Map/OrderedMap (keys non-null, value nullability configurable)
- Union (dense and sparse)

Unsupported types currently fall back to a `NullBuilder`. Extend `new_dyn_builder` as Arrow gains new logical types.

## Examples & Tests

- `cargo run -p typed-arrow-dyn --example nested_struct_list` shows nested structs and lists.
- `cargo test -p typed-arrow-dyn` exercises dictionaries, deep nesting, and nullability validation.

## License

This crate shares the repository license; see [`LICENSE`](../LICENSE).