Skip to main content

Module prelude

Module prelude 

Source

Re-exports§

pub use crate::Backend;
pub use crate::Comparator;
pub use crate::Scorer;

Structs§

AddressInitialKey
Blocks on: (first token of address field) + “:” + (first char of first-name field). Handles surname transpositions, two records at the same address with the same initial should end up in the same bucket even if the surname differs.
AddressTokenOverlap
Jaccard similarity on normalized token sets.
AliasPhoneticKey
Emits a "phonetic_dob:CODE:YEAR" key for each name stored in a pipe-delimited alias field (e.g. SIS II alias_namen).
BlockerFactory
CameraTimeWindowKey
Groups passages by camera identifier and a fixed-width time window.
ClusterConfig
Parameters controlling cluster shape after graph construction.
ComparisonBatch
Field-major SoA batch of comparison results for many pairs.
ComparisonVector
Comparison result for a single candidate pair.
CompositeBlocker
Composite blocker that applies multiple blocking keys.
ConnectedComponentsClusterer
Connected-components clusterer with weak-edge removal and star pruning.
DateFragmentKey
Blocking key that extracts the leading date fragment at a given granularity.
DocumentDigitSuffixKey
Variant that strips ALL non-digit characters before taking the suffix.
DocumentSuffixKey
Blocking key that strips non-alphanumeric characters from a document number and emits the last suffix_len characters as a key.
Entity
A resolved entity grouping one or more records.
EntityMember
A record’s membership in an entity, with its resolution score and method.
ExactFieldKey
FellegiSunterScorer
Fellegi-Sunter scorer.
FieldComparator
Pairwise field comparator that applies similarity functions to produce a field-major ComparisonBatch.
FuzzyYearKey
Phonetic blocking key that emits year-range variants for records with an estimated date of birth (the YYYY-01-01 Jan-1 convention), so estimated DOBs that differ by up to fuzzy_range years still share a blocking key.
GeoGridKey
Groups records by rounding geographic coordinates to a fixed grid cell.
InvertedIndex
Inverted index mapping blocking keys to record IDs.
JaroWinklerSimilarity
LevelThresholds
Configurable per-field thresholds for mapping a float similarity score to a ComparisonLevel.
LicensePlateNormKey
Normalizes a license plate (strips hyphens/spaces, uppercases) and emits the result as a single exact blocking key.
ModelArtifact
Everything that must be persisted after a successful EM training run.
ModelParams
Learned Fellegi-Sunter m/u parameters and classification thresholds for one schema.
PhoneticEqualitySimilarity
PhoneticNameDobKey
Blocking key that encodes the surname phonetically combined with the birth year.
PlateOCRFuzzyKey
Emits the normalized plate plus a deletion-neighbourhood key for each character position.
Record
A single data record with a unique ID and a map of field values.
RecordPool
Column-major record store: columns[field_idx][record_idx].
Schema
Ordered list of field definitions for a dataset.
SchemaBuilder
Fluent builder for constructing a Schema.
SchemaFingerprint
Fingerprint that identifies a schema structure plus its data distribution.
SchemaInferrer
Automatic schema detector.
SchemaRegistry
Persistent store for trained ModelArtifacts.
ScoredPair
A candidate pair annotated with its match weight, probability, and band.
StreetNumberEditDistance
Levenshtein edit distance on the leading street number.
SuffixKey
Blocking key that extracts the last N digits from a field value.
TokenOverlapSimilarity
TransliteratedPhoneticKey
Phonetic blocking key that first transliterates non-Latin script (Arabic, Cyrillic, Greek, etc.) to ASCII via any_ascii, then applies NFKD diacritic stripping and DoubleMetaphone encoding, combined with the DOB year.
VecRecordStore
Default in-memory RecordStore backed by a Vec, zero-config.
ZalEntityStore
SQLite-backed entity store persisted as a single .zes file.

Enums§

ComparisonLevel
DateGranularity
Controls how much of an ISO 8601 date is used as a blocking key.
FieldKind
FieldValue
Typed value stored in a record field.
JudgeVerdict
MatchBand
Coarse classification of a scored pair based on match probability.
PhoneticAlgo
Phonetic encoding algorithm.
ResolutionMethod
How an entity member was resolved.
SchemaCategory
High-level domain category for a dataset.
StartupMode
Decides how the pipeline should initialize when a new dataset arrives.
ZerError

Traits§

BlockIndex
Opaque blocking index.
Blocker
Extracts blocking keys from records and looks up candidates in an index.
Clusterer
Groups scored pairs into entity clusters.
ComparatorTrait
EntityStore
Persistent store for resolved entities.
Judge
Neural re-ranker that adjudicates borderline record pairs.
RecordStore
Backing store for records used during ingestion and batch runs.
ScorerTrait
SimilarityFn
Returns a similarity in [0.0, 1.0]. 0.0 = completely different, 1.0 = identical.

Type Aliases§

EntityId
RecordId