Patient Matching Rust Crate
A comprehensive Rust library for matching patient records in healthcare information exchanges, developed for NHS Wales.
Overview
This crate implements both deterministic and probabilistic patient matching algorithms based on research from:
- Patient Matching within a Health Information Exchange
- Patient Identification Techniques – Approaches, Implications, and Findings
Features
- ✅ Deterministic Matching: Exact matches on NHS numbers and key demographics
- ✅ Probabilistic Matching: Fuzzy matching with configurable scoring thresholds
- ✅ String Similarity Algorithms: Jaro-Winkler and Levenshtein distance
- ✅ UK NHS Number Support: Validation and normalization
- ✅ Phonetic Matching: Soundex-like algorithm for names (handles "Stephen" vs "Steven")
- ✅ Welsh Language Support: Handles diacritics (Siân → Sian)
- ✅ Address Normalization: Postcode and street address comparison
- ✅ Phone Number Normalization: UK format handling (+44, 0044, 07xxx)
- ✅ Configurable Weights: Customize importance of each field
- ✅ Serialization Support: JSON import/export via serde
Installation
Add to your Cargo.toml:
[]
= "0.1.0"
Usage
Basic Example
use ;
use NaiveDate;
Configurable Matching
use ;
// Strict matching (exact matches required)
let strict_engine = new;
// Lenient matching (more forgiving for typos)
let lenient_engine = new;
// Custom configuration
let custom_config = MatchConfig ;
let engine = new;
Deterministic Matching
// Check for exact matches only
let is_deterministic_match = engine.deterministic_match;
if is_deterministic_match
Detailed Match Breakdown
let result = engine.match_patients;
println!;
println!;
println!;
println!;
println!;
println!;
println!;
Patient Data Model
The Patient struct supports:
- NHS Number: UK national health identifier
- Name Fields: First, middle, and Family names
- Date of Birth: Birth date for age verification
- Gender: Male, Female, Other, Unknown
- Address: Multi-line address with postcode
- Contact: Phone, mobile, email
- Local ID: Hospital/practice-specific identifier
Matching Algorithm
The matching engine uses a weighted scoring system:
| Field | Default Weight | Purpose |
|---|---|---|
| NHS Number | 30% | Strongest identifier when available |
| Family Name | 20% | Critical demographic |
| Date of Birth | 20% | Age verification |
| Given Name | 15% | Important but subject to nicknames |
| Address | 5% | Supporting evidence |
| Gender | 5% | Supporting evidence |
| Phone | 5% | Supporting evidence |
Phonetic Matching provides bonus points when names sound similar (e.g., "Stephen" vs "Steven").
Research Basis
Key Findings Applied
-
No 100% Accuracy: Research shows even the best algorithms achieve 90-98% accuracy. This crate aims for transparency with confidence scores.
-
Standardization Critical: All inputs are normalized:
- Names: lowercase, remove diacritics, trim spaces
- Postcodes: uppercase, remove spaces
- Phone numbers: remove formatting, handle country codes
- NHS numbers: digits only
-
Multi-Factor Approach: Following research recommendations, matching uses multiple demographic fields rather than relying on a single identifier.
-
Weighted Probabilistic Matching: Combines multiple weak identifiers into a strong match signal, following best practices from health information exchanges.
Testing
Run the test suite:
# Unit tests
# Integration tests
# Run with output
# Run specific test
Test Coverage
- ✅ Perfect matches (100% score)
- ✅ Fuzzy name matching (typos, alternate spellings)
- ✅ Welsh names with diacritics
- ✅ Phonetic name matching
- ✅ UK phone number normalization
- ✅ Address comparison
- ✅ NHS number validation
- ✅ Deterministic matching
- ✅ Strict vs lenient modes
- ✅ Missing field handling
- ✅ Serialization/deserialization
Example: Running the Demo
This runs example scenarios including:
- Perfect match
- Fuzzy name match (Stephen vs Steven)
- Welsh names with diacritics (Siân vs Sian)
- Address matching
- Complete mismatch
- Strict vs lenient comparison
Performance Considerations
- Time Complexity: O(1) for deterministic matching, O(n) for string similarity
- Memory: Minimal allocation, uses borrowed references where possible
- Concurrency: Thread-safe, all operations are immutable
Limitations
- No Machine Learning: This is a rule-based system, not ML/AI
- Single Country Focus: Optimized for UK/NHS data formats
- No Persistent Storage: In-memory matching only
- No Batch Processing: Processes pairs of patients
Future Enhancements
- Support for other national identifiers (SSN, etc.)
- Batch matching API for large datasets
- Machine learning integration
- Performance benchmarks
- More sophisticated address parsing
- International phone number support
License
MIT OR Apache-2.0
Contributing
Contributions welcome! Please ensure:
- All tests pass (
cargo test) - Code is formatted (
cargo fmt) - No clippy warnings (
cargo clippy)
References
- Grannis SJ, et al. "Patient Matching within a Health Information Exchange." AMIA Annu Symp Proc. 2014.
- Reisman M. "Patient Identification Techniques – Approaches, Implications, and Findings." NCVHS. 2020.
Contact
For NHS Wales specific queries, contact the Digital Health and Care (DHCW) team.