Struct lingua::LanguageDetector

source ·

pub struct LanguageDetector { /* private fields */ }

Expand description

This struct detects the language of given input text.

Implementations§

source §

impl LanguageDetector

source

pub fn unload_language_models(&self)

Clears all language models loaded by this LanguageDetector instance and frees allocated memory previously consumed by the models.

source

pub fn detect_language_of<T: Into<String>>(&self, text: T) -> Option<Language>

Detects the language of given input text. If the language cannot be reliably detected, None is returned.

use lingua::Language::{English, French, German, Spanish};
use lingua::LanguageDetectorBuilder;

let detector = LanguageDetectorBuilder::from_languages(&[
    English,
    French,
    German,
    Spanish
])
.build();

let detected_language = detector.detect_language_of("languages are awesome");

assert_eq!(detected_language, Some(English));

source

pub fn detect_multiple_languages_of<T: Into<String>>( &self, text: T ) -> Vec<DetectionResult>

Attempts to detect multiple languages in mixed-language text.

This feature is experimental and under continuous development.

A list of DetectionResult is returned containing an entry for each contiguous single-language text section as identified by the library. Each entry consists of the identified language, a start index and an end index. The indices denote the substring that has been identified as a contiguous single-language text section.

use lingua::Language::{English, French, German};
use lingua::LanguageDetectorBuilder;

let detector = LanguageDetectorBuilder::from_languages(&[English, French, German]).build();
let sentence = "Parlez-vous français? \
    Ich spreche Französisch nur ein bisschen. \
    A little bit is better than nothing.";

let results = detector.detect_multiple_languages_of(sentence);

if let [first, second, third] = &results[..] {
    assert_eq!(first.language(), French);
    assert_eq!(
        &sentence[first.start_index()..first.end_index()],
        "Parlez-vous français? "
    );

    assert_eq!(second.language(), German);
    assert_eq!(
        &sentence[second.start_index()..second.end_index()],
        "Ich spreche Französisch nur ein bisschen. "
    );

    assert_eq!(third.language(), English);
    assert_eq!(
        &sentence[third.start_index()..third.end_index()],
        "A little bit is better than nothing."
    );
}

source

pub fn compute_language_confidence_values<T: Into<String>>( &self, text: T ) -> Vec<(Language, f64)>

Computes confidence values for each language supported by this detector for the given input text. These values denote how likely it is that the given text has been written in any of the languages supported by this detector.

A vector of two-element tuples is returned containing those languages which the calling instance of LanguageDetector has been built from, together with their confidence values. The entries are sorted by their confidence value in descending order. Each value is a probability between 0.0 and 1.0. The probabilities of all languages will sum to 1.0. If the language is unambiguously identified by the rule engine, the value 1.0 will always be returned for this language. The other languages will receive a value of 0.0.

use lingua::Language::{English, French, German, Spanish};
use lingua::LanguageDetectorBuilder;

let detector = LanguageDetectorBuilder::from_languages(&[
    English,
    French,
    German,
    Spanish
])
.build();

let confidence_values = detector
    .compute_language_confidence_values("languages are awesome")
    .into_iter()
    .map(|(language, confidence)| (language, (confidence * 100.0).round() / 100.0))
    .collect::<Vec<_>>();

assert_eq!(
    confidence_values,
    vec![
        (English, 0.93),
        (French, 0.04),
        (German, 0.02),
        (Spanish, 0.01)
    ]
);

source

pub fn compute_language_confidence<T: Into<String>>( &self, text: T, language: Language ) -> f64

Computes the confidence value for the given language and input text. This value denotes how likely it is that the given text has been written in the given language.

The value that this method computes is a number between 0.0 and 1.0. If the language is unambiguously identified by the rule engine, the value 1.0 will always be returned. If the given language is not supported by this detector instance, the value 0.0 will always be returned.

use lingua::Language::{English, French, German, Spanish};
use lingua::LanguageDetectorBuilder;

let detector = LanguageDetectorBuilder::from_languages(&[
    English,
    French,
    German,
    Spanish
])
.build();

let confidence = detector.compute_language_confidence("languages are awesome", French);
let rounded_confidence = (confidence * 100.0).round() / 100.0;

assert_eq!(rounded_confidence, 0.04);