1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
//! # Thing matcher
//!
//! A Rust library for matching records that describe `schema.org/Thing`
//! entities. The crate implements both **deterministic** and
//! **probabilistic** matching algorithms.
//!
//! The library is **deterministic**, **stateless**, **panic-free** in
//! library code, and **`Send + Sync`** so it can be used freely across
//! threads.
//!
//! ## What it does
//!
//! Given two [`Thing`] records — typically drawn from different source
//! systems — the [`MatchingEngine`] decides whether they refer to the
//! same item. The output is either a hard boolean (deterministic) or a
//! scored [`MatchResult`] with a per-field [`matcher::MatchBreakdown`] so
//! an auditor or downstream system can inspect the decision.
//!
//! The data model follows `schema.org/Thing` — the root type used to
//! describe any kind of item on the web. The crate compares the
//! identity-bearing properties of that vocabulary: `name`,
//! `alternateName`, `description`, `disambiguatingDescription`,
//! `identifier`, `url`, `image`, `sameAs`, `mainEntityOfPage`,
//! `additionalType`, `subjectOf`, and `owner`.
//!
//! ## Crate layout
//!
//! | Module | Purpose |
//! |---|---|
//! | [`models`] | Data types: [`Thing`], [`ThingBuilder`], [`Identifier`]. |
//! | [`normalizer`] | Text normalisation: names, free text, URLs, phonetic codes. |
//! | [`scorer`] | String-similarity and set-similarity primitives. |
//! | [`matcher`] | Orchestration: [`MatchingEngine`], [`MatchConfig`], [`MatchResult`]. |
//! | [`error`] | Error enum [`MatchingError`] and [`Result`] alias. |
//!
//! ## Quick start — probabilistic match
//!
//! ```
//! use thing_matcher::{MatchingEngine, MatchConfig, Thing};
//!
//! let a = Thing::builder()
//! .name("Eiffel Tower")
//! .add_alternate_name("La Tour Eiffel")
//! .url("https://www.toureiffel.paris/")
//! .build();
//!
//! let b = Thing::builder()
//! .name("Tour Eiffel")
//! .url("https://www.toureiffel.paris/")
//! .build();
//!
//! let engine = MatchingEngine::new(MatchConfig::default());
//! let result = engine.match_things(&a, &b);
//!
//! assert!(result.is_match);
//! ```
//!
//! ## Inspecting the per-field breakdown
//!
//! Every probabilistic match returns a per-field score so the decision is
//! auditable end-to-end. Missing or unparseable fields score `None`
//! rather than zero — they do not penalise the thing.
//!
//! ```
//! use thing_matcher::{MatchingEngine, Thing};
//!
//! let p = Thing::builder()
//! .name("Big Ben")
//! .url("https://en.wikipedia.org/wiki/Big_Ben")
//! .build();
//! let q = p.clone();
//!
//! let result = MatchingEngine::default_config().match_things(&p, &q);
//! assert!(result.breakdown.name_score.unwrap() > 0.99);
//! assert_eq!(result.breakdown.url_score, Some(1.0));
//! ```
//!
//! ## Configuration presets
//!
//! Three configurations cover most use cases. Use [`MatchConfig::strict`]
//! when callers must rely on the answer; use [`MatchConfig::lenient`] to
//! triage large candidate sets where false negatives are worse than false
//! positives.
//!
//! ```
//! use thing_matcher::{MatchConfig, MatchingEngine};
//!
//! let strict = MatchingEngine::new(MatchConfig::strict());
//! let default = MatchingEngine::default_config();
//! let lenient = MatchingEngine::new(MatchConfig::lenient());
//!
//! // All three engines share the same scoring pipeline; only the
//! // threshold and a couple of weights differ.
//! # let _ = (strict, default, lenient);
//! ```
//!
//! ## Determinism and safety
//!
//! - **Deterministic.** Same inputs => same outputs. No clocks, no RNGs,
//! no environment variables.
//! - **No `unsafe`.** This crate forbids `unsafe` code.
//! - **No IO.** The library does not log, read files, or open sockets.
//! - **No panics** in library code paths; every fallible input returns
//! `None` from a scorer or a [`MatchingError`].
pub use ;
pub use ;
pub use ;
pub use Normalizer;
pub use ;