Expand description
§Natural language detection library
§314 ScriptLanguages (187 models + 127 single language scripts)
One language can be written in multiple scripts, so it will be detected as a different ScriptLanguage (language + script).
ISO 639-3 (using Language) and ISO 15924 (using Script)
are implemented, also combined using ScriptLanguage.
§Example
use langram::{DetectorBuilder, ModelsStorage};
let models_storage = ModelsStorage::default();
let detector = DetectorBuilder::new(&models_storage).build();
// preload models for faster detection
detector.preload_models();
// single thread
let text = "text";
let result = detector.detect_top_one_reordered(text);
// or multithreaded (rayon for example)
use rayon::iter::IntoParallelRefIterator;
use rayon::iter::ParallelIterator;
let texts = &["text1", "text2"];
let results: Vec<_> = texts
.par_iter()
.map(|text| detector.detect_top_one_reordered(text))
.collect();detector also has other methods
Macros§
Structs§
- Detector
- Detector
Builder - Fraction
- Models
Storage - With all models preloaded uses around 4.1GB of RAM (2.4GB using max_trigrams).
Enums§
- Language
- Int representation is unstable and can be changed anytime.
Code representation (const
into_code/from_code) or string representation (constinto_str/from_str) are more stable. - Ngram
Size - Script
- Has aliases in comparison to
UcdScript. Int representation is unstable and can be changed anytime. Code representation (constinto_code/from_code) or string representation (constinto_str/from_str) are more stable. - Script
Language - Language + script. Ordered by total speakers.
Value-names not always represent a script used, so a “default” script can be changed.
Int representation is unstable and can be changed anytime.
Parts representation (const
into_parts/from_parts) or code representation (constinto_code/from_code) or string representation (constinto_str/from_str) are more stable. - UcdScript
- Int representation is unstable and can be changed anytime.
Code representation (const
into_code/from_code) or string representation (constinto_str/from_str) are more stable.