Crate langram

Source
Expand description

§Natural language detection library

§314 ScriptLanguages (187 models + 127 single language scripts)

One language can be written in multiple scripts, so it will be detected as a different ScriptLanguage (language + script).

ISO 639-3 (using Language) and ISO 15924 (using Script) are implemented, also combined using ScriptLanguage.

§Example

use langram::{DetectorBuilder, ModelsStorage};

let models_storage = ModelsStorage::default();
let detector = DetectorBuilder::new(&models_storage).build();
// preload models for faster detection
detector.preload_models();

// single thread
let text = "text";
let result = detector.detect_top_one_reordered(text);

// or multithreaded (rayon for example)
use rayon::iter::IntoParallelRefIterator;
use rayon::iter::ParallelIterator;

let texts = &["text1", "text2"];
let results: Vec<_> = texts
    .par_iter()
    .map(|text| detector.detect_top_one_reordered(text))
    .collect();

detector also has other methods

Macros§

ahashset

Structs§

Detector
DetectorBuilder
Fraction
ModelsStorage
With all models preloaded uses around 4.1GB of RAM (2.4GB using max_trigrams).

Enums§

Language
Int representation is unstable and can be changed anytime. Code representation (const into_code/from_code) or string representation (const into_str/from_str) are more stable.
NgramSize
Script
Has aliases in comparison to UcdScript. Int representation is unstable and can be changed anytime. Code representation (const into_code/from_code) or string representation (const into_str/from_str) are more stable.
ScriptLanguage
Language + script. Ordered by total speakers. Value-names not always represent a script used, so a “default” script can be changed. Int representation is unstable and can be changed anytime. Parts representation (const into_parts/from_parts) or code representation (const into_code/from_code) or string representation (const into_str/from_str) are more stable.
UcdScript
Int representation is unstable and can be changed anytime. Code representation (const into_code/from_code) or string representation (const into_str/from_str) are more stable.

Type Aliases§

FileModel