Expand description
Whatlang is a Rust library to detect(regonize) natural languages. Apart from it, the library also recognizes scripts (writing system). Every language and script are represented by determined list of enums.
§Examples
Using detect
function:
use whatlang::{detect, Lang, Script};
let text = "Ĉu vi ne volas eklerni Esperanton? Bonvolu! Estas unu de la plej bonaj aferoj!";
let info = detect(text).unwrap();
assert_eq!(info.lang(), Lang::Epo);
assert_eq!(info.script(), Script::Latin);
// Confidence is in the range from 0 to 1.
assert_eq!(info.confidence(), 1.0);
assert!(info.is_reliable());
Using Detector
with specified denylist or allowlist:
use whatlang::{Detector, Lang};
let allowlist = vec![Lang::Eng, Lang::Rus];
// You can also create detector using with_denylist function
let detector = Detector::with_allowlist(allowlist);
let lang = detector.detect_lang("There is no reason not to learn Esperanto.");
assert_eq!(lang, Some(Lang::Eng));
§Features
Feature | Description |
---|---|
enum-map | Lang and Script implement Enum trait from enum-map |
arbitrary | Support Arbitrary |
serde | Implements Serialize and Deserialize for Lang and Script |
dev | Enables whatlang::dev module which provides some internal API.It exists for profiling purposes and normal users are discouraged to to rely on this API. |
Structs§
- Detector
- Configurable structure that holds detection options and provides functions to detect language and script.
- Info
- Represents a full outcome of language detection.
Enums§
- Lang
- Represents a language following ISO 639-3 standard.
- Script
- Represents a writing system (Latin, Cyrillic, Arabic, etc).
Functions§
- detect
- Detect a language and a script by a given text.
- detect_
lang - Detect only a language by a given text.
- detect_
script - Detect only a script by a given text.
Works much faster than a complete detection with
detect
.