Expand description
§Natural language detection library
§308 ScriptLanguages (187 models + 121 single language scripts)
One language can be written in multiple scripts, so it will be detected as a different ScriptLanguage
(language + script).
ISO 639-3
(using Language
) and ISO 15924
(using Script
)
are implemented, also combined using ScriptLanguage
.
§Example
use langram::*;
let models_storage = ModelsStorage::default();
let detector = DetectorBuilder::new(&models_storage).build();
// preload models for faster detection
detector.preload_models();
// single thread
let text = "text";
let result = detector.detect_top_one(text, 0.0);
// or multithreaded (rayon for example)
use rayon::iter::IntoParallelRefIterator;
use rayon::iter::ParallelIterator;
let texts = &["text1", "text2"];
let results: Vec<_> = texts
.par_iter()
.map(|text| detector.detect_top_one(text, 0.0))
.collect();
detector
also has other methods
Structs§
- Detector
- Detector
Builder - Fraction
- Models
Storage - With all models preloaded uses around 4.1GB of RAM.
Enums§
- Language
- Int representation is unstable and can be changed anytime.
Code representation (const
into_code
/from_code
) or string representation (constinto_str
/from_str
) are more stable. - Ngram
Size - Script
- Int representation is unstable and can be changed anytime.
Code representation (const
into_code
/from_code
) or string representation (constinto_str
/from_str
) are more stable. - Script
Language - Language + script.
Value-names not always represent a script used, so a “default” script can be changed.
Int representation is unstable and can be changed anytime.
Parts representation (const
into_parts
/from_parts
) or code representation (constinto_code
/from_code
) or string representation (constinto_str
/from_str
) are more stable. - UcdScript
- Int representation is unstable and can be changed anytime.
Code representation (const
into_code
/from_code
) or string representation (constinto_str
/from_str
) are more stable.