pub struct RomanizationMap(/* private fields */);Expand description
A Thai-word → RTGS-romanization lookup table.
Built from tab-separated data via RomanizationMap::from_tsv.
Lookup is O(log n) via BTreeMap.
Implementations§
Source§impl RomanizationMap
impl RomanizationMap
Sourcepub fn from_tsv(data: &str) -> Self
pub fn from_tsv(data: &str) -> Self
Parse a tab-separated romanization table.
Format: thai_word\trtgs_romanization — one entry per line.
Lines beginning with # and blank lines are skipped.
For duplicate keys, the last entry wins.
Sourcepub fn romanize(&self, word: &str) -> Option<&str>
pub fn romanize(&self, word: &str) -> Option<&str>
Look up the RTGS romanization for a pre-segmented Thai word.
Returns the table hit if the word is in the hand-curated list, otherwise
applies the built-in rule engine. Returns None only when the word
contains no Thai characters (e.g. pure Latin or numbers).
The returned &str borrows from the map for table hits; rule-engine
results are returned as an owned String via the romanize_owned
helper — callers that want a borrowed &str should use
romanize_or_raw.
§Example
use kham_core::romanizer::RomanizationMap;
let map = RomanizationMap::builtin();
// Table hit
assert_eq!(map.romanize("กิน"), Some("kin"));
// OOV word — not in table; use romanize_owned() for rule-engine fallback
assert_eq!(map.romanize("เปปซี่"), None);
// Non-Thai input
assert_eq!(map.romanize("xyz"), None);Sourcepub fn romanize_owned(&self, word: &str) -> Option<String>
pub fn romanize_owned(&self, word: &str) -> Option<String>
Romanize word to an owned String, using the table first, then the
rule engine for out-of-vocabulary Thai words.
Returns None only when the word contains no Thai characters.
§Example
use kham_core::romanizer::RomanizationMap;
let map = RomanizationMap::builtin();
assert_eq!(map.romanize_owned("กิน").as_deref(), Some("kin"));
// OOV word gets rule-based approximation
assert!(map.romanize_owned("เปปซี่").is_some());
// Non-Thai returns None
assert_eq!(map.romanize_owned("hello"), None);Sourcepub fn romanize_or_raw<'a>(&'a self, word: &'a str) -> &'a str
pub fn romanize_or_raw<'a>(&'a self, word: &'a str) -> &'a str
Return the RTGS romanization for word, or word unchanged if not in
the table. Only performs table lookup — no rule engine.
For OOV Thai words that should fall back to the rule engine, use
romanize_or_rule instead.
§Example
use kham_core::romanizer::RomanizationMap;
let map = RomanizationMap::from_tsv("กิน\tkin\n");
assert_eq!(map.romanize_or_raw("กิน"), "kin");
assert_eq!(map.romanize_or_raw("xyz"), "xyz");
// OOV Thai is returned unchanged (raw passthrough)
assert_eq!(map.romanize_or_raw("เปปซี่"), "เปปซี่");Sourcepub fn romanize_or_rule(&self, word: &str) -> String
pub fn romanize_or_rule(&self, word: &str) -> String
Return the RTGS romanization for word.
Checks the table first; for OOV Thai words the built-in rule engine is
applied. Non-Thai input is returned unchanged. Always returns an owned
String.
§Example
use kham_core::romanizer::RomanizationMap;
let map = RomanizationMap::builtin();
// Table hit
assert_eq!(map.romanize_or_rule("กิน"), "kin");
// Non-Thai passes through
assert_eq!(map.romanize_or_rule("hello"), "hello");
// OOV Thai gets rule-based approximation
let oov = map.romanize_or_rule("เปปซี่");
assert!(!oov.is_empty());
assert!(!oov.chars().any(|c| ('\u{0E00}'..='\u{0E7F}').contains(&c)));Sourcepub fn romanize_tokens(&self, tokens: &[&str]) -> Vec<String>
pub fn romanize_tokens(&self, tokens: &[&str]) -> Vec<String>
Romanize a slice of pre-segmented token strings.
Returns a Vec<String> aligned 1:1 with the input slice. Tokens not
found in the table are returned unchanged (same behaviour as
romanize_or_raw).
§Example
use kham_core::romanizer::RomanizationMap;
let map = RomanizationMap::from_tsv("กิน\tkin\nปลา\tpla\n");
let out = map.romanize_tokens(&["กิน", "ปลา"]);
assert_eq!(out, vec!["kin", "pla"]);