Struct lingua::LanguageDetector
source · pub struct LanguageDetector { /* private fields */ }
Expand description
This struct detects the language of given input text.
Implementations§
source§impl LanguageDetector
impl LanguageDetector
sourcepub fn unload_language_models(&self)
pub fn unload_language_models(&self)
Clears all language models loaded by this LanguageDetector
instance
and frees allocated memory previously consumed by the models.
sourcepub fn detect_language_of<T: Into<String>>(&self, text: T) -> Option<Language>
pub fn detect_language_of<T: Into<String>>(&self, text: T) -> Option<Language>
Detects the language of given input text.
If the language cannot be reliably detected, None
is returned.
This method operates in a single thread. If you want to classify
a very large set of texts, you will probably want to use method
detect_languages_in_parallel_of
instead.
use lingua::Language::{English, French, German, Spanish};
use lingua::LanguageDetectorBuilder;
let detector = LanguageDetectorBuilder::from_languages(&[
English,
French,
German,
Spanish
])
.build();
let detected_language = detector.detect_language_of("languages are awesome");
assert_eq!(detected_language, Some(English));
sourcepub fn detect_languages_in_parallel_of<T: Into<String> + Clone + Send + Sync>(
&self,
texts: &[T]
) -> Vec<Option<Language>>
pub fn detect_languages_in_parallel_of<T: Into<String> + Clone + Send + Sync>( &self, texts: &[T] ) -> Vec<Option<Language>>
Detects the languages of all given input texts.
If the language cannot be reliably detected for a text,
None
is put into the result vector.
This method is a good fit if you want to classify a very large set of texts. It potentially operates in multiple threads, depending on how many idle CPU cores are available and how many texts are passed to this method.
If you do not want or need parallel execution, use method
detect_language_of
instead.
use lingua::Language::{English, French, German, Spanish};
use lingua::LanguageDetectorBuilder;
let detector = LanguageDetectorBuilder::from_languages(&[
English,
French,
German,
Spanish
])
.build();
let detected_languages = detector.detect_languages_in_parallel_of(&[
"languages are awesome",
"Sprachen sind großartig",
"des langues sont géniales",
"los idiomas son geniales"
]);
assert_eq!(
detected_languages,
vec![
Some(English),
Some(German),
Some(French),
Some(Spanish)
]
);
sourcepub fn detect_multiple_languages_of<T: Into<String>>(
&self,
text: T
) -> Vec<DetectionResult>
pub fn detect_multiple_languages_of<T: Into<String>>( &self, text: T ) -> Vec<DetectionResult>
Attempts to detect multiple languages in mixed-language text.
This feature is experimental and under continuous development.
A vector of DetectionResult
is returned containing an entry for each contiguous
single-language text section as identified by the library. Each entry consists
of the identified language, a start index and an end index. The indices denote
the substring that has been identified as a contiguous single-language text section.
This method operates in a single thread. If you want to classify
a very large set of texts, you will probably want to use method
detect_multiple_languages_in_parallel_of
instead.
use lingua::Language::{English, French, German};
use lingua::LanguageDetectorBuilder;
let detector = LanguageDetectorBuilder::from_languages(&[
English,
French,
German
])
.build();
let sentence = "Parlez-vous français? \
Ich spreche Französisch nur ein bisschen. \
A little bit is better than nothing.";
let results = detector.detect_multiple_languages_of(sentence);
if let [first, second, third] = &results[..] {
assert_eq!(first.language(), French);
assert_eq!(
&sentence[first.start_index()..first.end_index()],
"Parlez-vous français? "
);
assert_eq!(second.language(), German);
assert_eq!(
&sentence[second.start_index()..second.end_index()],
"Ich spreche Französisch nur ein bisschen. "
);
assert_eq!(third.language(), English);
assert_eq!(
&sentence[third.start_index()..third.end_index()],
"A little bit is better than nothing."
);
}
sourcepub fn detect_multiple_languages_in_parallel_of<T: Into<String> + Clone + Send + Sync>(
&self,
texts: &[T]
) -> Vec<Vec<DetectionResult>>
pub fn detect_multiple_languages_in_parallel_of<T: Into<String> + Clone + Send + Sync>( &self, texts: &[T] ) -> Vec<Vec<DetectionResult>>
Attempts to detect multiple languages in mixed-language text.
This feature is experimental and under continuous development.
A vector of DetectionResult
is returned for each text containing an
entry for each contiguous single-language text section as identified by
the library. Each entry consists of the identified language, a start index
and an end index. The indices denote the substring that has been identified
as a contiguous single-language text section.
This method is a good fit if you want to classify a very large set of texts. It potentially operates in multiple threads, depending on how many idle CPU cores are available and how many texts are passed to this method.
If you do not want or need parallel execution, use method
detect_multiple_languages_of
instead.
sourcepub fn compute_language_confidence_values<T: Into<String>>(
&self,
text: T
) -> Vec<(Language, f64)>
pub fn compute_language_confidence_values<T: Into<String>>( &self, text: T ) -> Vec<(Language, f64)>
Computes confidence values for each language supported by this detector for the given input text. These values denote how likely it is that the given text has been written in any of the languages supported by this detector.
A vector of two-element tuples is returned containing those languages which the
calling instance of LanguageDetector
has been built from, together with their
confidence values. The entries are sorted by their confidence value in descending order.
Each value is a probability between 0.0 and 1.0. The probabilities of all languages will
sum to 1.0. If the language is unambiguously identified by the rule engine, the value
1.0 will always be returned for this language. The other languages will receive a value
of 0.0.
This method operates in a single thread. If you want to classify
a very large set of texts, you will probably want to use method
compute_language_confidence_values_in_parallel
instead.
use lingua::Language::{English, French, German, Spanish};
use lingua::LanguageDetectorBuilder;
let detector = LanguageDetectorBuilder::from_languages(&[
English,
French,
German,
Spanish
])
.build();
let confidence_values = detector
.compute_language_confidence_values("languages are awesome")
.into_iter()
.map(|(language, confidence)| (language, (confidence * 100.0).round() / 100.0))
.collect::<Vec<_>>();
assert_eq!(
confidence_values,
vec![
(English, 0.93),
(French, 0.04),
(German, 0.02),
(Spanish, 0.01)
]
);
sourcepub fn compute_language_confidence_values_in_parallel<T: Into<String> + Clone + Send + Sync>(
&self,
texts: &[T]
) -> Vec<Vec<(Language, f64)>>
pub fn compute_language_confidence_values_in_parallel<T: Into<String> + Clone + Send + Sync>( &self, texts: &[T] ) -> Vec<Vec<(Language, f64)>>
Computes confidence values for each language supported by this detector for all the given input texts. The confidence values denote how likely it is that the given text has been written in any of the languages supported by this detector.
This method is a good fit if you want to classify a very large set of texts. It potentially operates in multiple threads, depending on how many idle CPU cores are available and how many texts are passed to this method.
If you do not want or need parallel execution, use method
compute_language_confidence_values
instead.
use lingua::Language::{English, French, German, Spanish};
use lingua::LanguageDetectorBuilder;
let detector = LanguageDetectorBuilder::from_languages(&[
English,
French,
German,
Spanish
])
.build();
let confidence_values = detector
.compute_language_confidence_values_in_parallel(&[
"languages are awesome",
"Sprachen sind großartig"
])
.into_iter()
.map(|vector| {
vector
.into_iter()
.map(|(language, confidence)| {
(language, (confidence * 100.0).round() / 100.0)
})
.collect::<Vec<_>>()
})
.collect::<Vec<_>>();
assert_eq!(
confidence_values,
vec![
vec![
(English, 0.93),
(French, 0.04),
(German, 0.02),
(Spanish, 0.01)
],
vec![
(German, 0.99),
(Spanish, 0.01),
(English, 0.0),
(French, 0.0)
]
]
);
sourcepub fn compute_language_confidence<T: Into<String>>(
&self,
text: T,
language: Language
) -> f64
pub fn compute_language_confidence<T: Into<String>>( &self, text: T, language: Language ) -> f64
Computes the confidence value for the given language and input text. This value denotes how likely it is that the given text has been written in the given language.
The value that this method computes is a number between 0.0 and 1.0. If the language is unambiguously identified by the rule engine, the value 1.0 will always be returned. If the given language is not supported by this detector instance, the value 0.0 will always be returned.
This method operates in a single thread. If you want to classify
a very large set of texts, you will probably want to use method
compute_language_confidence_in_parallel
instead.
use lingua::Language::{English, French, German, Spanish};
use lingua::LanguageDetectorBuilder;
let detector = LanguageDetectorBuilder::from_languages(&[
English,
French,
German,
Spanish
])
.build();
let confidence = detector.compute_language_confidence("languages are awesome", French);
let rounded_confidence = (confidence * 100.0).round() / 100.0;
assert_eq!(rounded_confidence, 0.04);
sourcepub fn compute_language_confidence_in_parallel<T: Into<String> + Clone + Send + Sync>(
&self,
texts: &[T],
language: Language
) -> Vec<f64>
pub fn compute_language_confidence_in_parallel<T: Into<String> + Clone + Send + Sync>( &self, texts: &[T], language: Language ) -> Vec<f64>
Computes the confidence values of all input texts for the given language. A confidence value denotes how likely it is that a given text has been written in a given language.
The values that this method computes are numbers between 0.0 and 1.0. If the language is unambiguously identified by the rule engine, the value 1.0 will always be returned. If the given language is not supported by this detector instance, the value 0.0 will always be returned.
This method is a good fit if you want to classify a very large set of texts. It potentially operates in multiple threads, depending on how many idle CPU cores are available and how many texts are passed to this method.
If you do not want or need parallel execution, use method
compute_language_confidence
instead.
use lingua::Language::{English, French, German, Spanish};
use lingua::LanguageDetectorBuilder;
let detector = LanguageDetectorBuilder::from_languages(&[
English,
French,
German,
Spanish
])
.build();
let confidence_values = detector.compute_language_confidence_in_parallel(
&[
"languages are awesome",
"Sprachen sind großartig",
"des langues sont géniales",
"los idiomas son geniales"
],
French
)
.into_iter()
.map(|confidence| (confidence * 100.0).round() / 100.0)
.collect::<Vec<_>>();
assert_eq!(
confidence_values,
vec![
0.04,
0.0,
0.92,
0.07
]
);