Expand description
DEPRECATED in favor of whatlang, which is native Rust and smaller. If you have a compelling use-case for this code, please open an issue.
Detect the language of a string using the cld2 library from the Chromium project.
use cld2::{detect_language, Format, Reliable, Lang};
let text = "It is an ancient Mariner,
And he stoppeth one of three.
'By thy long grey beard and glittering eye,
Now wherefore stopp'st thou me?";
assert_eq!((Some(Lang("en")), Reliable),
detect_language(text, Format::Text));
This library wraps the cld2-sys
library, which provides a raw
interface to cld2. The only major feature which isn’t yet wrapped is
the ResultChunk
interface, because it tends to give fairly imprecise
answers—it wouldn’t make a very good multi-lingual spellchecker
component, for example. As always, pull requests are eagerly welcome!
WARNING: We assume that nobody tries to change the loaded cld2
data
tables or calls the C++ function CLD2::DetectLanguageVersion
behind
our backs. These configuration and debugging APIs in cld2
are not
thread safe.
For more information, see the GitHub project for this library.
Re-exports§
pub use self::Reliability::Reliable;
pub use self::Reliability::Unreliable;
Structs§
- Detection
Result - Detailed language detection results.
- Hints
- Hints to the decoder, which it will use to make better guesses.
- Lang
- A language code, normally two letters for common languages.
- Language
Score - Detailed information about how well the input text matched a specific language.
Enums§
- Format
- Possible data formats.
- Reliability
- Is the output of the language decoder reliable?
Functions§
- detect_
language - Detect the language of the input text.
- detect_
language_ ext - Detect the language of the input text, using optional hints, and return detailed statistics.
- detector_
version - Get the version of cld2 and its embedded data files as a string.