Skip to main content

Crate zer_schema

Crate zer_schema 

Source
Expand description

Schema registry and model persistence for zer.

This crate provides three cooperating components:

  1. SchemaInferrer, automatic FieldKind detection from column names and value patterns; produces a Schema without requiring the caller to know the dataset structure upfront.

  2. SchemaFingerprint, a compact identity for a schema plus its data distribution (SHA-256 hash of field names/kinds, per-field null rates, cardinalities).

  3. SchemaRegistry, a sled-backed persistent store for ModelArtifacts (trained Fellegi-Sunter parameters). On startup the pipeline calls SchemaRegistry::lookup_startup_mode to decide whether to load params directly (exact match), warm-start EM (similar schema), or run full EM from priors (new/incompatible schema).

Re-exports§

pub use artifact::ModelArtifact;
pub use config::NameHeuristics;
pub use config::ValuePatterns;
pub use fingerprint::FieldStats;
pub use fingerprint::SchemaFingerprint;
pub use infer::SchemaInferrer;
pub use registry::SchemaRegistry;
pub use registry::StartupMode;
pub use similarity::fingerprint_distance;
pub use similarity::EXACT_MATCH_THRESHOLD;
pub use similarity::WARM_START_THRESHOLD;

Modules§

artifact
config
fingerprint
infer
registry
similarity