nxs/arrow_project.rs
1//! Arrow C Data Interface projection helpers for columnar `.nxb`.
2//!
3//! ## Intended purpose
4//!
5//! This module is intended to export NXS columnar fields as **Arrow C Data Interface**
6//! buffers, enabling zero-copy interoperability with Arrow-aware runtimes (DataFusion,
7//! Polars, DuckDB, etc.) without copying the underlying values data.
8//!
9//! ## What is NOT implemented
10//!
11//! - **Offset widening**: NXS variable-length columns use `(N+1) × u32` little-endian
12//! offsets; Arrow `LargeUtf8` / `LargeBinary` require `(N+1) × i64` offsets.
13//! The widening step (copying and sign-extending each `u32` to `i64`) is not yet
14//! implemented — callers currently receive the raw `u32` offsets slice only.
15//! - **Arrow C Data Interface structs**: the `ArrowSchema` and `ArrowArray` C structs
16//! (as defined in the Arrow ABI specification) are not emitted; no FFI boundary is
17//! crossed. This module provides only the Rust-side buffer views.
18//! - **PAX layout support**: only contiguous columnar (FLAG_COLUMNAR) files are
19//! considered; PAX page-scattered columns are out of scope for this stub.
20//!
21//! ## Status
22//!
23//! **Intentionally left as a stub** pending the extensions phase. Full Arrow C Data
24//! Interface export (including `ArrowSchema` / `ArrowArray` lifetime management and
25//! `u32 → i64` offset widening) is planned for the commercial extensions tier and
26//! will not be implemented in this MIT-licensed driver module.
27
28use crate::error::Result;
29use crate::layout::col_var_parts;
30
31/// Zero-copy view of a variable-length column sector inside a mapped `.nxb` buffer.
32pub struct VarColumnView<'a> {
33 pub null_bitmap: &'a [u8],
34 /// `(record_count + 1) × 4` bytes, u32 little-endian (NXS; not Arrow i64).
35 pub offsets: &'a [u8],
36 pub values: &'a [u8],
37 pub record_count: usize,
38}
39
40impl<'a> VarColumnView<'a> {
41 pub fn from_sector(sector: &'a [u8], record_count: usize) -> Result<Self> {
42 let (bm, offsets, values) = col_var_parts(sector, record_count)?;
43 Ok(Self {
44 null_bitmap: bm,
45 offsets,
46 values,
47 record_count,
48 })
49 }
50
51 /// Number of u32 offset slots (`record_count + 1`).
52 pub fn offset_count(&self) -> usize {
53 self.record_count + 1
54 }
55}