Skip to main content

RomanizationMap

Struct RomanizationMap 

Source
pub struct RomanizationMap(/* private fields */);
Expand description

A Thai-word → RTGS-romanization lookup table.

Built from tab-separated data via RomanizationMap::from_tsv. Lookup is O(log n) via BTreeMap.

Implementations§

Source§

impl RomanizationMap

Source

pub fn builtin() -> Self

Load the built-in RTGS romanization table.

Source

pub fn from_tsv(data: &str) -> Self

Parse a tab-separated romanization table.

Format: thai_word\trtgs_romanization — one entry per line. Lines beginning with # and blank lines are skipped. For duplicate keys, the last entry wins.

Source

pub fn romanize(&self, word: &str) -> Option<&str>

Look up the RTGS romanization for a pre-segmented Thai word.

Returns the table hit if the word is in the hand-curated list, otherwise applies the built-in rule engine. Returns None only when the word contains no Thai characters (e.g. pure Latin or numbers).

The returned &str borrows from the map for table hits; rule-engine results are returned as an owned String via the romanize_owned helper — callers that want a borrowed &str should use romanize_or_raw.

§Example
use kham_core::romanizer::RomanizationMap;

let map = RomanizationMap::builtin();
// Table hit
assert_eq!(map.romanize("กิน"), Some("kin"));
// OOV word — not in table; use romanize_owned() for rule-engine fallback
assert_eq!(map.romanize("เปปซี่"), None);
// Non-Thai input
assert_eq!(map.romanize("xyz"), None);
Source

pub fn romanize_owned(&self, word: &str) -> Option<String>

Romanize word to an owned String, using the table first, then the rule engine for out-of-vocabulary Thai words.

Returns None only when the word contains no Thai characters.

§Example
use kham_core::romanizer::RomanizationMap;

let map = RomanizationMap::builtin();
assert_eq!(map.romanize_owned("กิน").as_deref(), Some("kin"));
// OOV word gets rule-based approximation
assert!(map.romanize_owned("เปปซี่").is_some());
// Non-Thai returns None
assert_eq!(map.romanize_owned("hello"), None);
Source

pub fn romanize_or_raw<'a>(&'a self, word: &'a str) -> &'a str

Return the RTGS romanization for word, or word unchanged if not in the table. Only performs table lookup — no rule engine.

For OOV Thai words that should fall back to the rule engine, use romanize_or_rule instead.

§Example
use kham_core::romanizer::RomanizationMap;

let map = RomanizationMap::from_tsv("กิน\tkin\n");
assert_eq!(map.romanize_or_raw("กิน"), "kin");
assert_eq!(map.romanize_or_raw("xyz"), "xyz");
// OOV Thai is returned unchanged (raw passthrough)
assert_eq!(map.romanize_or_raw("เปปซี่"), "เปปซี่");
Source

pub fn romanize_or_rule(&self, word: &str) -> String

Return the RTGS romanization for word.

Checks the table first; for OOV Thai words the built-in rule engine is applied. Non-Thai input is returned unchanged. Always returns an owned String.

§Example
use kham_core::romanizer::RomanizationMap;

let map = RomanizationMap::builtin();
// Table hit
assert_eq!(map.romanize_or_rule("กิน"), "kin");
// Non-Thai passes through
assert_eq!(map.romanize_or_rule("hello"), "hello");
// OOV Thai gets rule-based approximation
let oov = map.romanize_or_rule("เปปซี่");
assert!(!oov.is_empty());
assert!(!oov.chars().any(|c| ('\u{0E00}'..='\u{0E7F}').contains(&c)));
Source

pub fn romanize_tokens(&self, tokens: &[&str]) -> Vec<String>

Romanize a slice of pre-segmented token strings.

Returns a Vec<String> aligned 1:1 with the input slice. Tokens not found in the table are returned unchanged (same behaviour as romanize_or_raw).

§Example
use kham_core::romanizer::RomanizationMap;

let map = RomanizationMap::from_tsv("กิน\tkin\nปลา\tpla\n");
let out = map.romanize_tokens(&["กิน", "ปลา"]);
assert_eq!(out, vec!["kin", "pla"]);
Source

pub fn len(&self) -> usize

Number of entries in the map.

Source

pub fn is_empty(&self) -> bool

Return true if the map has no entries.

Auto Trait Implementations§

Blanket Implementations§

Source§

impl<T> Any for T
where T: 'static + ?Sized,

Source§

fn type_id(&self) -> TypeId

Gets the TypeId of self. Read more
Source§

impl<T> Borrow<T> for T
where T: ?Sized,

Source§

fn borrow(&self) -> &T

Immutably borrows from an owned value. Read more
Source§

impl<T> BorrowMut<T> for T
where T: ?Sized,

Source§

fn borrow_mut(&mut self) -> &mut T

Mutably borrows from an owned value. Read more
Source§

impl<T> From<T> for T

Source§

fn from(t: T) -> T

Returns the argument unchanged.

Source§

impl<T, U> Into<U> for T
where U: From<T>,

Source§

fn into(self) -> U

Calls U::from(self).

That is, this conversion is whatever the implementation of From<T> for U chooses to do.

Source§

impl<T, U> TryFrom<U> for T
where U: Into<T>,

Source§

type Error = Infallible

The type returned in the event of a conversion error.
Source§

fn try_from(value: U) -> Result<T, <T as TryFrom<U>>::Error>

Performs the conversion.
Source§

impl<T, U> TryInto<U> for T
where U: TryFrom<T>,

Source§

type Error = <U as TryFrom<T>>::Error

The type returned in the event of a conversion error.
Source§

fn try_into(self) -> Result<U, <U as TryFrom<T>>::Error>

Performs the conversion.