Skip to main content

Crate amt

Crate amt 

Source
Expand description

§AMT — Articulatory Moment Transform

Language-agnostic phonetic name matching via spectral fingerprinting of universal sonority class sequences.

§Quick start

use amt::{encode_token, matches, similarity};

// Encode a single name
let code = encode_token("Khaled");

// Test match across transliterations and scripts
assert!(matches("Khaled", "Khalid"));
assert!(matches("Khaled", "خالد"));
assert!(matches("Gamal", "Jamal"));
assert!(!matches("Khaled", "Robert"));

// Graded similarity in [0, 1]
let s = similarity("Khaled Sameer", "khaled samir");
assert!(s > 0.9);
use amt::{encode_token, BKTree};

let mut tree: BKTree<String> = BKTree::new();
for name in ["Khaled", "Khalid", "Ahmed", "Robert"] {
    let code = encode_token(name);
    for &sp in &code.spectrals {
        tree.add(sp, name.to_string());
    }
}

let query = encode_token("Khaleed");
let hits = tree.query(query.spectrals[0], 4);

§Algorithm

Each name is mapped to a sequence of 8 sonority classes, projected onto the first 4 Chebyshev polynomials, Gray-quantized, and packed into a 32-bit spectral key. A parallel 64-bit Bloom signature over skip-bigrams of the same sequence captures edit-tolerant co-occurrence patterns. Two names match if they share any spectral key.

See the whitepaper in the repository for full details, benchmarks against Soundex / Metaphone / Double Metaphone / NYSIIS / Beider-Morse, and theoretical justifications.

Re-exports§

pub use self::core::encode;
pub use self::core::encode_batch;
pub use self::core::encode_token;
pub use self::core::preprocess;
pub use self::core::Code;
pub use self::indexing::BKTree;
pub use self::similarity::matches;
pub use self::similarity::similarity;
pub use self::similarity::token_distance;
pub use self::sonority::class_of;
pub use self::sonority::Class;

Modules§

core
Core encoding pipeline.
indexing
Indexed retrieval.
similarity
Distance and similarity over AMT codes.
sonority
Universal sonority alphabet.