Skip to main content

MatchingEngine

Struct MatchingEngine 

Source
pub struct MatchingEngine { /* private fields */ }
Expand description

Worker matcher engine.

The engine is immutable after construction and cheap to clone (it owns only a MatchConfig). Construct one and call its methods from any thread.

use worker_matcher::{MatchConfig, MatchingEngine};

let engine_a = MatchingEngine::default_config();
let engine_b = MatchingEngine::new(MatchConfig::strict());

Implementations§

Source§

impl MatchingEngine

Source

pub fn new(config: MatchConfig) -> Self

Construct an engine with the given configuration.

use worker_matcher::{MatchConfig, MatchingEngine};
let engine = MatchingEngine::new(MatchConfig::lenient());
Source

pub fn default_config() -> Self

Construct an engine with MatchConfig::default.

use worker_matcher::MatchingEngine;
let engine = MatchingEngine::default_config();
Source

pub fn match_workers(&self, worker1: &Worker, worker2: &Worker) -> MatchResult

Compare two workers probabilistically and return a MatchResult.

The score is the weight-renormalised sum of every component that scored on both records. Missing fields are skipped, not penalised.

use worker_matcher::{MatchingEngine, Worker};
use chrono::NaiveDate;

let p = Worker::builder()
    .given_name("Carys")
    .family_name("Pritchard")
    .date_of_birth(NaiveDate::from_ymd_opt(1985, 1, 1).unwrap())
    .build();

let result = MatchingEngine::default_config().match_workers(&p, &p);
assert!(result.is_match);
assert!(result.score > 0.99);
Source

pub fn match_one_to_many( &self, query: &Worker, candidates: &[Worker], ) -> Vec<MatchResult>

Score a single query against many candidates in parallel-friendly fashion. Returns one MatchResult per candidate, in the same order as the input slice.

This is the building block for a Master Worker Index workflow: given a new record and the existing population, produce a fully audited score for each potential link. The engine is immutable and Send + Sync, so call-sites that want parallel evaluation can wrap the call in rayon::par_iter or similar without further changes to this crate.

For sparse / large populations consider blocking — pre-filter candidates with a cheap predicate (e.g. matching family-name Soundex or postcode prefix) before passing the survivors to this function. Blocking is a consumer concern and is intentionally not baked into the API; the crate stays a pure scoring library.

§Examples
use worker_matcher::{MatchingEngine, Worker};

let query = Worker::builder()
    .given_name("Ada")
    .family_name("Lovelace")
    .build();
let candidates = vec![
    Worker::builder().given_name("Ada").family_name("Lovelace").build(),
    Worker::builder().given_name("Alan").family_name("Turing").build(),
    Worker::builder().given_name("Grace").family_name("Hopper").build(),
];

let engine = MatchingEngine::default_config();
let results = engine.match_one_to_many(&query, &candidates);

assert_eq!(results.len(), 3);
assert!(results[0].is_match);
assert!(!results[1].is_match);

Empty candidates yield an empty result:

let q = Worker::builder().given_name("Solo").build();
let r = MatchingEngine::default_config().match_one_to_many(&q, &[]);
assert!(r.is_empty());
Source

pub fn rank_one_to_many( &self, query: &Worker, candidates: &[Worker], ) -> Vec<(usize, MatchResult)>

Score and rank: return (original_index, MatchResult) tuples sorted by descending score. Ties are broken by ascending original index, so the result is deterministic.

Convenience wrapper around MatchingEngine::match_one_to_many for the common “give me the best matches first” workflow. Consumers that need a filtered view (e.g. only is_match == true) can drop entries off the front, while consumers that need to pair results with external metadata can use the preserved original index.

§Examples
use worker_matcher::{MatchingEngine, Worker};

let query = Worker::builder().given_name("Ada").family_name("Lovelace").build();
let candidates = vec![
    Worker::builder().given_name("Grace").family_name("Hopper").build(),   // index 0
    Worker::builder().given_name("Ada").family_name("Lovelace").build(),   // index 1 — best match
    Worker::builder().given_name("Alan").family_name("Turing").build(),    // index 2
];

let ranked = MatchingEngine::default_config().rank_one_to_many(&query, &candidates);
assert_eq!(ranked.len(), 3);
assert_eq!(ranked[0].0, 1);                  // best match's original index
assert!(ranked[0].1.score >= ranked[1].1.score);
assert!(ranked[1].1.score >= ranked[2].1.score);
Source

pub fn deterministic_match(&self, worker1: &Worker, worker2: &Worker) -> bool

Compare two workers deterministically and return a single boolean.

Returns true iff any of the following hold:

  • Both UK NHS Numbers parse and are equal.
  • Both France NIRs parse and are equal.
  • Both España TSIs parse and are equal.
  • Both Éire IHIs parse and are equal.
  • Both UK Northern Ireland H&C Numbers parse and are equal.
  • Both US Social Security Numbers parse and are equal.
  • Both Australia IHIs parse and are equal.
  • Both Germany KVNRs parse and are equal.
  • Both Italy Codice Fiscale values parse and are equal.
  • Both Netherlands BSNs parse and are equal.
  • Both Sweden Workernummer values parse and are equal.
  • Both UK Scotland CHI Numbers parse and are equal.
  • The workers share at least one (country, number) passport-book pair after canonicalisation (see crate::PassportBook).
  • Normalised given name matches, and normalised family name matches, and date of birth matches, and gender matches (or is missing on at least one side).

National identifiers from different schemes never cross-match: an NHS Number is only ever compared against another NHS Number, never against an H&C Number that happens to share the same 10 digits.

use worker_matcher::{MatchingEngine, Worker};

// Same NHS number, different formatting → match.
let a = Worker::builder().uk_nhs_number("943 476 5919").build();
let b = Worker::builder().uk_nhs_number("9434765919").build();
assert!(MatchingEngine::default_config().deterministic_match(&a, &b));

Auto Trait Implementations§

Blanket Implementations§

Source§

impl<T> Any for T
where T: 'static + ?Sized,

Source§

fn type_id(&self) -> TypeId

Gets the TypeId of self. Read more
Source§

impl<T> Borrow<T> for T
where T: ?Sized,

Source§

fn borrow(&self) -> &T

Immutably borrows from an owned value. Read more
Source§

impl<T> BorrowMut<T> for T
where T: ?Sized,

Source§

fn borrow_mut(&mut self) -> &mut T

Mutably borrows from an owned value. Read more
Source§

impl<T> From<T> for T

Source§

fn from(t: T) -> T

Returns the argument unchanged.

Source§

impl<T, U> Into<U> for T
where U: From<T>,

Source§

fn into(self) -> U

Calls U::from(self).

That is, this conversion is whatever the implementation of From<T> for U chooses to do.

Source§

impl<T> IntoEither for T

Source§

fn into_either(self, into_left: bool) -> Either<Self, Self>

Converts self into a Left variant of Either<Self, Self> if into_left is true. Converts self into a Right variant of Either<Self, Self> otherwise. Read more
Source§

fn into_either_with<F>(self, into_left: F) -> Either<Self, Self>
where F: FnOnce(&Self) -> bool,

Converts self into a Left variant of Either<Self, Self> if into_left(&self) returns true. Converts self into a Right variant of Either<Self, Self> otherwise. Read more
Source§

impl<T, U> TryFrom<U> for T
where U: Into<T>,

Source§

type Error = Infallible

The type returned in the event of a conversion error.
Source§

fn try_from(value: U) -> Result<T, <T as TryFrom<U>>::Error>

Performs the conversion.
Source§

impl<T, U> TryInto<U> for T
where U: TryFrom<T>,

Source§

type Error = <U as TryFrom<T>>::Error

The type returned in the event of a conversion error.
Source§

fn try_into(self) -> Result<U, <U as TryFrom<T>>::Error>

Performs the conversion.