pub struct UnicodeLanguageDetector { /* private fields */ }Expand description
Detect language from Unicode character ranges.
Supports CJK disambiguation (JA vs ZH) by checking for kana presence. Latin characters are mapped to a configurable default language.
Implementations§
Source§impl UnicodeLanguageDetector
impl UnicodeLanguageDetector
Sourcepub fn new(languages: &[String], default_latin_language: &str) -> Self
pub fn new(languages: &[String], default_latin_language: &str) -> Self
Create a new detector for the given set of languages.
default_latin_language controls which language Latin-script
characters (A-Z, a-z, accented Latin) are assigned to.
Sourcepub fn detect_char(&self, ch: char, context_has_kana: bool) -> Option<&str>
pub fn detect_char(&self, ch: char, context_has_kana: bool) -> Option<&str>
Detect language for a single character.
context_has_kana is used for CJK ideograph disambiguation: if the
surrounding text contains kana, CJK ideographs are classified as
Japanese rather than Chinese.
Returns None for neutral characters (whitespace, digits,
ASCII punctuation, etc.).
Auto Trait Implementations§
impl Freeze for UnicodeLanguageDetector
impl RefUnwindSafe for UnicodeLanguageDetector
impl Send for UnicodeLanguageDetector
impl Sync for UnicodeLanguageDetector
impl Unpin for UnicodeLanguageDetector
impl UnsafeUnpin for UnicodeLanguageDetector
impl UnwindSafe for UnicodeLanguageDetector
Blanket Implementations§
Source§impl<T> BorrowMut<T> for Twhere
T: ?Sized,
impl<T> BorrowMut<T> for Twhere
T: ?Sized,
Source§fn borrow_mut(&mut self) -> &mut T
fn borrow_mut(&mut self) -> &mut T
Mutably borrows from an owned value. Read more