Skip to main content

Crate anomalyx_normalize

Crate anomalyx_normalize 

Source
Expand description

§ax-normalize — any corpus → one RecordSet

The article’s normalization promise: “given any corpus of information regardless of its format, we’ll normalize it.” This crate maps recognized text formats (CSV, TSV, NDJSON, JSON) onto the engine-independent RecordSet from ax-core. Binary columnar formats (Parquet, Arrow IPC) land behind this same boundary in the Polars-backed slice — detectors never see the difference.

Normalization is deterministic: column order is stable (header order for tabular input, sorted key-union for JSON), and absence is explicit — a key missing from one JSON row becomes ax_core::Value::Null, never a guess.

Re-exports§

pub use format::Format;

Modules§

binary
Binary columnar formats via the Polars/Arrow backbone.
format
Format identification: by extension when we have a path, by content sniff for stdin. Detection is conservative — an unrecognized stream is an AxError::UnknownFormat, never a silent guess.
infer
Scalar type inference. Turns raw textual cells and JSON scalars into the closed Value set, the same way regardless of which format they came from — so a 1 in CSV and a 1 in JSON normalize identically.

Functions§

normalize
Normalizes bytes from logical source into a RecordSet, resolving the format by extension then content sniff.
normalize_as
Normalizes with an explicit format (skips detection).