Expand description
§AMT — Articulatory Moment Transform
Language-agnostic phonetic name matching via spectral fingerprinting of universal sonority class sequences.
§Quick start
use amt::{encode_token, matches, similarity};
// Encode a single name
let code = encode_token("Khaled");
// Test match across transliterations and scripts
assert!(matches("Khaled", "Khalid"));
assert!(matches("Khaled", "خالد"));
assert!(matches("Gamal", "Jamal"));
assert!(!matches("Khaled", "Robert"));
// Graded similarity in [0, 1]
let s = similarity("Khaled Sameer", "khaled samir");
assert!(s > 0.9);§Indexed fuzzy search
use amt::{encode_token, BKTree};
let mut tree: BKTree<String> = BKTree::new();
for name in ["Khaled", "Khalid", "Ahmed", "Robert"] {
let code = encode_token(name);
for &sp in &code.spectrals {
tree.add(sp, name.to_string());
}
}
let query = encode_token("Khaleed");
let hits = tree.query(query.spectrals[0], 4);§Algorithm
Each name is mapped to a sequence of 8 sonority classes, projected onto the first 4 Chebyshev polynomials, Gray-quantized, and packed into a 32-bit spectral key. A parallel 64-bit Bloom signature over skip-bigrams of the same sequence captures edit-tolerant co-occurrence patterns. Two names match if they share any spectral key.
See the whitepaper in the repository for full details, benchmarks against Soundex / Metaphone / Double Metaphone / NYSIIS / Beider-Morse, and theoretical justifications.
Re-exports§
pub use self::core::encode;pub use self::core::encode_batch;pub use self::core::encode_token;pub use self::core::preprocess;pub use self::core::Code;pub use self::indexing::BKTree;pub use self::similarity::matches;pub use self::similarity::similarity;pub use self::similarity::token_distance;pub use self::sonority::class_of;pub use self::sonority::Class;
Modules§
- core
- Core encoding pipeline.
- indexing
- Indexed retrieval.
- similarity
- Distance and similarity over AMT codes.
- sonority
- Universal sonority alphabet.