Expand description
§ax-normalize — any corpus → one RecordSet
The article’s normalization promise: “given any corpus of information
regardless of its format, we’ll normalize it.” This crate maps every
recognized format onto the engine-independent RecordSet from ax-core,
so detectors never see the difference between a CSV and a Parquet file.
Formats are plugins: each is an independent FormatParser (one file
under parsers), resolved by a ParserRegistry via file extension then
content sniff. Adding a format is a new file plus one registration line —
see parsers::default_registry. Binary columnar formats (Parquet, Arrow
IPC) live behind the default-on polars feature.
Normalization is deterministic: column order is stable (header order for
tabular input, sorted key-union for JSON), and absence is explicit — a key
missing from one JSON row becomes ax_core::Value::Null, never a guess.
Re-exports§
pub use parser::Confidence;pub use parser::FormatParser;pub use parser::ParserRegistry;
Modules§
- infer
- Scalar type inference. Turns raw textual cells and JSON scalars into the
closed
Valueset, the same way regardless of which format they came from — so a1in CSV and a1in JSON normalize identically. - parser
- The format-parser plugin contract.
- parsers
- Built-in format parsers — one module per format. Adding a format is two
steps: add a module here, and add one
register(...)line indefault_registry. Nothing else in the crate changes. - table
- Shared row-oriented table builder for the JSON-family parsers.
Functions§
- normalize
- Normalizes
bytesfrom logicalsourceinto aRecordSet, resolving the format by extension then content sniff against the default parser registry. - normalize_
with - Normalizes with an explicitly chosen format
id(skips detection).