Skip to main content

thing_matcher/
lib.rs

1//! # Thing matcher
2//!
3//! A Rust library for matching records that describe `schema.org/Thing`
4//! entities. The crate implements both **deterministic** and
5//! **probabilistic** matching algorithms.
6//!
7//! The library is **deterministic**, **stateless**, **panic-free** in
8//! library code, and **`Send + Sync`** so it can be used freely across
9//! threads.
10//!
11//! ## What it does
12//!
13//! Given two [`Thing`] records — typically drawn from different source
14//! systems — the [`MatchingEngine`] decides whether they refer to the
15//! same item. The output is either a hard boolean (deterministic) or a
16//! scored [`MatchResult`] with a per-field [`matcher::MatchBreakdown`] so
17//! an auditor or downstream system can inspect the decision.
18//!
19//! The data model follows `schema.org/Thing` — the root type used to
20//! describe any kind of item on the web. The crate compares the
21//! identity-bearing properties of that vocabulary: `name`,
22//! `alternateName`, `description`, `disambiguatingDescription`,
23//! `identifier`, `url`, `image`, `sameAs`, `mainEntityOfPage`,
24//! `additionalType`, `subjectOf`, and `owner`.
25//!
26//! ## Crate layout
27//!
28//! | Module | Purpose |
29//! |---|---|
30//! | [`models`]       | Data types: [`Thing`], [`ThingBuilder`], [`Identifier`]. |
31//! | [`normalizer`]   | Text normalisation: names, free text, URLs, phonetic codes. |
32//! | [`scorer`]       | String-similarity and set-similarity primitives. |
33//! | [`matcher`]      | Orchestration: [`MatchingEngine`], [`MatchConfig`], [`MatchResult`]. |
34//! | [`error`]        | Error enum [`MatchingError`] and [`Result`] alias. |
35//!
36//! ## Quick start — probabilistic match
37//!
38//! ```
39//! use thing_matcher::{MatchingEngine, MatchConfig, Thing};
40//!
41//! let a = Thing::builder()
42//!     .name("Eiffel Tower")
43//!     .add_alternate_name("La Tour Eiffel")
44//!     .url("https://www.toureiffel.paris/")
45//!     .build();
46//!
47//! let b = Thing::builder()
48//!     .name("Tour Eiffel")
49//!     .url("https://www.toureiffel.paris/")
50//!     .build();
51//!
52//! let engine = MatchingEngine::new(MatchConfig::default());
53//! let result = engine.match_things(&a, &b);
54//!
55//! assert!(result.is_match);
56//! ```
57//!
58//! ## Inspecting the per-field breakdown
59//!
60//! Every probabilistic match returns a per-field score so the decision is
61//! auditable end-to-end. Missing or unparseable fields score `None`
62//! rather than zero — they do not penalise the thing.
63//!
64//! ```
65//! use thing_matcher::{MatchingEngine, Thing};
66//!
67//! let p = Thing::builder()
68//!     .name("Big Ben")
69//!     .url("https://en.wikipedia.org/wiki/Big_Ben")
70//!     .build();
71//! let q = p.clone();
72//!
73//! let result = MatchingEngine::default_config().match_things(&p, &q);
74//! assert!(result.breakdown.name_score.unwrap() > 0.99);
75//! assert_eq!(result.breakdown.url_score, Some(1.0));
76//! ```
77//!
78//! ## Configuration presets
79//!
80//! Three configurations cover most use cases. Use [`MatchConfig::strict`]
81//! when callers must rely on the answer; use [`MatchConfig::lenient`] to
82//! triage large candidate sets where false negatives are worse than false
83//! positives.
84//!
85//! ```
86//! use thing_matcher::{MatchConfig, MatchingEngine};
87//!
88//! let strict   = MatchingEngine::new(MatchConfig::strict());
89//! let default  = MatchingEngine::default_config();
90//! let lenient  = MatchingEngine::new(MatchConfig::lenient());
91//!
92//! // All three engines share the same scoring pipeline; only the
93//! // threshold and a couple of weights differ.
94//! # let _ = (strict, default, lenient);
95//! ```
96//!
97//! ## Determinism and safety
98//!
99//! - **Deterministic.** Same inputs => same outputs. No clocks, no RNGs,
100//!   no environment variables.
101//! - **No `unsafe`.** This crate forbids `unsafe` code.
102//! - **No IO.** The library does not log, read files, or open sockets.
103//! - **No panics** in library code paths; every fallible input returns
104//!   `None` from a scorer or a [`MatchingError`].
105
106#![forbid(unsafe_code)]
107#![deny(missing_docs)]
108
109pub mod error;
110pub mod matcher;
111pub mod models;
112pub mod normalizer;
113pub mod scorer;
114
115pub use error::{MatchingError, Result};
116pub use matcher::{Confidence, MatchBreakdown, MatchConfig, MatchResult, MatchingEngine};
117pub use models::{Identifier, Thing, ThingBuilder};
118pub use normalizer::Normalizer;
119pub use scorer::{Scorer, SimilarityAlgorithm};