Skip to main content

Crate anomalyx_normalize

Crate anomalyx_normalize 

Source
Expand description

§ax-normalize — any corpus → one RecordSet

The article’s normalization promise: “given any corpus of information regardless of its format, we’ll normalize it.” This crate maps every recognized format onto the engine-independent RecordSet from ax-core, so detectors never see the difference between a CSV and a Parquet file.

Formats are plugins: each is an independent FormatParser (one file under parsers), resolved by a ParserRegistry via file extension then content sniff. Adding a format is a new file plus one registration line — see parsers::default_registry. Binary columnar formats (Parquet, Arrow IPC) live behind the default-on polars feature.

Normalization is deterministic: column order is stable (header order for tabular input, sorted key-union for JSON), and absence is explicit — a key missing from one JSON row becomes ax_core::Value::Null, never a guess.

Re-exports§

pub use parser::Confidence;
pub use parser::FormatParser;
pub use parser::ParserRegistry;

Modules§

infer
Scalar type inference. Turns raw textual cells and JSON scalars into the closed Value set, the same way regardless of which format they came from — so a 1 in CSV and a 1 in JSON normalize identically.
parser
The format-parser plugin contract.
parsers
Built-in format parsers — one module per format. Adding a format is two steps: add a module here, and add one register(...) line in default_registry. Nothing else in the crate changes.
table
Shared row-oriented table builder for the JSON-family parsers.

Functions§

normalize
Normalizes bytes from logical source into a RecordSet, resolving the format by extension then content sniff against the default parser registry.
normalize_with
Normalizes with an explicitly chosen format id (skips detection).