Expand description
§ax-normalize — any corpus → one RecordSet
The article’s normalization promise: “given any corpus of information
regardless of its format, we’ll normalize it.” This crate maps recognized
text formats (CSV, TSV, NDJSON, JSON) onto the engine-independent
RecordSet from ax-core. Binary columnar formats (Parquet, Arrow IPC)
land behind this same boundary in the Polars-backed slice — detectors never
see the difference.
Normalization is deterministic: column order is stable (header order for
tabular input, sorted key-union for JSON), and absence is explicit — a key
missing from one JSON row becomes ax_core::Value::Null, never a guess.
Re-exports§
pub use format::Format;
Modules§
- binary
- Binary columnar formats via the Polars/Arrow backbone.
- format
- Format identification: by extension when we have a path, by content sniff
for stdin. Detection is conservative — an unrecognized stream is an
AxError::UnknownFormat, never a silent guess. - infer
- Scalar type inference. Turns raw textual cells and JSON scalars into the
closed
Valueset, the same way regardless of which format they came from — so a1in CSV and a1in JSON normalize identically.
Functions§
- normalize
- Normalizes
bytesfrom logicalsourceinto aRecordSet, resolving the format by extension then content sniff. - normalize_
as - Normalizes with an explicit format (skips detection).