Skip to main content

Crate worker_matcher

Crate worker_matcher 

Source
Expand description

§Worker matcher

A Rust library for matching worker records in healthcare information exchanges. The crate implements both deterministic and probabilistic matching algorithms grounded in peer-reviewed research on worker identification (see spec.md §5).

The library is deterministic, stateless, panic-free in library code, and Send + Sync so it can be used freely across threads.

§What it does

Given two Worker records — typically drawn from different source systems — the MatchingEngine decides whether they refer to the same human being. The output is either a hard boolean (deterministic) or a scored MatchResult with a per-field matcher::MatchBreakdown so a clinician or downstream system can audit the decision.

§Crate layout

ModulePurpose
modelsData structures: Worker, WorkerBuilder, Address, Gender.
identifiersNational healthcare identifier parsers — UK NHS, FR NIR, ES TSI, IE IHI, UK H&C.
normalizerText normalisation: names, postcodes, phone numbers, phonetic codes.
scorerString-similarity primitives: Jaro-Winkler, Levenshtein, exact, combined.
matcherOrchestration: MatchingEngine, MatchConfig, MatchResult.
errorError enum MatchingError and Result alias.

See AGENTS/architecture.md for the layering rules.

§Quick start — probabilistic match

use worker_matcher::{Gender, MatchingEngine, MatchConfig, Worker};
use chrono::NaiveDate;

let alice = Worker::builder()
    .given_name("Alice")
    .family_name("Williams")
    .date_of_birth(NaiveDate::from_ymd_opt(1980, 5, 15).unwrap())
    .gender(Gender::Female)
    .build();

let alyce = Worker::builder()
    .given_name("Alyce")   // alternate spelling
    .family_name("Williams")
    .date_of_birth(NaiveDate::from_ymd_opt(1980, 5, 15).unwrap())
    .gender(Gender::Female)
    .build();

let engine = MatchingEngine::new(MatchConfig::default());
let result = engine.match_workers(&alice, &alyce);

assert!(result.is_match, "Alice and Alyce should be a fuzzy match");
assert!(result.score > 0.85);

§Quick start — deterministic match

use worker_matcher::{MatchingEngine, Worker};

// NHS-format numbers in two textual layouts.
let a = Worker::builder().uk_nhs_number("943 476 5919").build();
let b = Worker::builder().uk_nhs_number("9434765919").build();

let engine = MatchingEngine::default_config();
assert!(engine.deterministic_match(&a, &b),
    "same NHS number with different formatting must match deterministically");

§Inspecting the per-field breakdown

Every probabilistic match returns a per-field score so the decision is auditable end-to-end. Missing or unparseable fields score None rather than zero — they do not penalise the worker.

use worker_matcher::{MatchingEngine, Worker};
use chrono::NaiveDate;

let p1 = Worker::builder()
    .given_name("John")
    .family_name("Smith")
    .date_of_birth(NaiveDate::from_ymd_opt(1980, 5, 15).unwrap())
    .build();
let p2 = p1.clone();

let result = MatchingEngine::default_config().match_workers(&p1, &p2);

assert_eq!(result.breakdown.date_of_birth_score, Some(1.0));
assert!(result.breakdown.given_name_score.unwrap() > 0.99);
assert!(result.breakdown.family_name_score.unwrap() > 0.99);
// NHS number was missing on both — score is `None`, not `0.0`.
assert_eq!(result.breakdown.uk_nhs_number_score, None);

§Configuration presets

Three configurations cover most use cases. Use MatchConfig::strict when a clinician must rely on the answer; use MatchConfig::lenient to triage large candidate sets where false negatives are worse than false positives.

use worker_matcher::{MatchConfig, MatchingEngine};

let strict   = MatchingEngine::new(MatchConfig::strict());
let default  = MatchingEngine::default_config();
let lenient  = MatchingEngine::new(MatchConfig::lenient());

// All three engines share the same scoring pipeline; only the
// threshold and a couple of weights differ.

§Determinism and safety

  • Deterministic. Same inputs ⇒ same outputs. No clocks, no RNGs, no environment variables.
  • No unsafe. This is a clinical-adjacent library.
  • No IO. The library does not log, read files, or open sockets.
  • No panics in library code paths; every fallible input returns None from a scorer or a MatchingError.

§Further reading

Re-exports§

pub use error::MatchingError;
pub use error::Result;
pub use matcher::Confidence;
pub use matcher::MatchBreakdown;
pub use matcher::MatchConfig;
pub use matcher::MatchResult;
pub use matcher::MatchingEngine;
pub use models::Address;
pub use models::BloodType;
pub use models::Gender;
pub use models::PassportBook;
pub use models::Worker;
pub use models::WorkerBuilder;
pub use nicknames::NicknameTable;
pub use normalizer::Normalizer;
pub use normalizer::ParsedAddressLine;
pub use scorer::Scorer;
pub use scorer::SimilarityAlgorithm;

Modules§

error
Error types for worker-matcher operations.
identifiers
National healthcare identifier parsing and validation.
matcher
Worker matcher engine: deterministic and probabilistic algorithms.
models
Data models for worker demographics and identifiers.
nicknames
Nickname equivalence tables for given-name matching.
normalizer
Text normalisation for worker demographic data.
scorer
Scoring algorithms for string similarity and field comparison.