Skip to main content

worker_matcher/
lib.rs

1//! # Worker matcher
2//!
3//! A Rust library for matching worker records in healthcare information
4//! exchanges. The crate implements both **deterministic** and **probabilistic**
5//! matching algorithms grounded in peer-reviewed research on worker
6//! identification (see [`spec.md`](https://github.com/sixarm/worker-matcher/blob/main/spec.md) §5).
7//!
8//! The library is **deterministic**, **stateless**, **panic-free** in library
9//! code, and **`Send + Sync`** so it can be used freely across threads.
10//!
11//! ## What it does
12//!
13//! Given two [`Worker`] records — typically drawn from different source
14//! systems — the [`MatchingEngine`] decides whether they refer to the same
15//! human being. The output is either a hard boolean (deterministic) or a
16//! scored [`MatchResult`] with a per-field [`matcher::MatchBreakdown`] so a
17//! clinician or downstream system can audit the decision.
18//!
19//! ## Crate layout
20//!
21//! | Module | Purpose |
22//! |---|---|
23//! | [`models`]       | Data structures: [`Worker`], [`WorkerBuilder`], [`Address`], [`Gender`]. |
24//! | [`identifiers`]  | National healthcare identifier parsers — UK NHS, FR NIR, ES TSI, IE IHI, UK H&C. |
25//! | [`normalizer`]   | Text normalisation: names, postcodes, phone numbers, phonetic codes. |
26//! | [`scorer`]       | String-similarity primitives: Jaro-Winkler, Levenshtein, exact, combined. |
27//! | [`matcher`]      | Orchestration: [`MatchingEngine`], [`MatchConfig`], [`MatchResult`]. |
28//! | [`error`]        | Error enum [`MatchingError`] and [`Result`] alias. |
29//!
30//! See [`AGENTS/architecture.md`](https://github.com/sixarm/worker-matcher/blob/main/AGENTS/architecture.md)
31//! for the layering rules.
32//!
33//! ## Quick start — probabilistic match
34//!
35//! ```
36//! use worker_matcher::{Gender, MatchingEngine, MatchConfig, Worker};
37//! use chrono::NaiveDate;
38//!
39//! let alice = Worker::builder()
40//!     .given_name("Alice")
41//!     .family_name("Williams")
42//!     .date_of_birth(NaiveDate::from_ymd_opt(1980, 5, 15).unwrap())
43//!     .gender(Gender::Female)
44//!     .build();
45//!
46//! let alyce = Worker::builder()
47//!     .given_name("Alyce")   // alternate spelling
48//!     .family_name("Williams")
49//!     .date_of_birth(NaiveDate::from_ymd_opt(1980, 5, 15).unwrap())
50//!     .gender(Gender::Female)
51//!     .build();
52//!
53//! let engine = MatchingEngine::new(MatchConfig::default());
54//! let result = engine.match_workers(&alice, &alyce);
55//!
56//! assert!(result.is_match, "Alice and Alyce should be a fuzzy match");
57//! assert!(result.score > 0.85);
58//! ```
59//!
60//! ## Quick start — deterministic match
61//!
62//! ```
63//! use worker_matcher::{MatchingEngine, Worker};
64//!
65//! // NHS-format numbers in two textual layouts.
66//! let a = Worker::builder().uk_nhs_number("943 476 5919").build();
67//! let b = Worker::builder().uk_nhs_number("9434765919").build();
68//!
69//! let engine = MatchingEngine::default_config();
70//! assert!(engine.deterministic_match(&a, &b),
71//!     "same NHS number with different formatting must match deterministically");
72//! ```
73//!
74//! ## Inspecting the per-field breakdown
75//!
76//! Every probabilistic match returns a per-field score so the decision is
77//! auditable end-to-end. Missing or unparseable fields score `None` rather
78//! than zero — they do not penalise the worker.
79//!
80//! ```
81//! use worker_matcher::{MatchingEngine, Worker};
82//! use chrono::NaiveDate;
83//!
84//! let p1 = Worker::builder()
85//!     .given_name("John")
86//!     .family_name("Smith")
87//!     .date_of_birth(NaiveDate::from_ymd_opt(1980, 5, 15).unwrap())
88//!     .build();
89//! let p2 = p1.clone();
90//!
91//! let result = MatchingEngine::default_config().match_workers(&p1, &p2);
92//!
93//! assert_eq!(result.breakdown.date_of_birth_score, Some(1.0));
94//! assert!(result.breakdown.given_name_score.unwrap() > 0.99);
95//! assert!(result.breakdown.family_name_score.unwrap() > 0.99);
96//! // NHS number was missing on both — score is `None`, not `0.0`.
97//! assert_eq!(result.breakdown.uk_nhs_number_score, None);
98//! ```
99//!
100//! ## Configuration presets
101//!
102//! Three configurations cover most use cases. Use [`MatchConfig::strict`]
103//! when a clinician must rely on the answer; use [`MatchConfig::lenient`]
104//! to triage large candidate sets where false negatives are worse than
105//! false positives.
106//!
107//! ```
108//! use worker_matcher::{MatchConfig, MatchingEngine};
109//!
110//! let strict   = MatchingEngine::new(MatchConfig::strict());
111//! let default  = MatchingEngine::default_config();
112//! let lenient  = MatchingEngine::new(MatchConfig::lenient());
113//!
114//! // All three engines share the same scoring pipeline; only the
115//! // threshold and a couple of weights differ.
116//! # let _ = (strict, default, lenient);
117//! ```
118//!
119//! ## Determinism and safety
120//!
121//! - **Deterministic.** Same inputs ⇒ same outputs. No clocks, no RNGs, no
122//!   environment variables.
123//! - **No `unsafe`.** This is a clinical-adjacent library.
124//! - **No IO.** The library does not log, read files, or open sockets.
125//! - **No panics** in library code paths; every fallible input returns
126//!   `None` from a scorer or a [`MatchingError`].
127//!
128//! ## Further reading
129//!
130//! - [`spec.md`](https://github.com/sixarm/worker-matcher/blob/main/spec.md) — the living specification.
131//! - [`AGENTS/matching-algorithm.md`](https://github.com/sixarm/worker-matcher/blob/main/AGENTS/matching-algorithm.md) — practitioner's view of the algorithm.
132//! - [`AGENTS/normalization.md`](https://github.com/sixarm/worker-matcher/blob/main/AGENTS/normalization.md) — text normalisation rules.
133
134#![forbid(unsafe_code)]
135#![deny(missing_docs)]
136
137pub mod error;
138pub mod identifiers;
139pub mod matcher;
140pub mod models;
141pub mod nicknames;
142pub mod normalizer;
143pub mod scorer;
144
145pub use error::{MatchingError, Result};
146pub use matcher::{Confidence, MatchBreakdown, MatchConfig, MatchResult, MatchingEngine};
147pub use models::{Address, BloodType, Gender, PassportBook, Worker, WorkerBuilder};
148pub use nicknames::NicknameTable;
149pub use normalizer::{Normalizer, ParsedAddressLine};
150pub use scorer::{Scorer, SimilarityAlgorithm};