langdetect_rs/
lib.rs

1
2//! # langdetect-rs
3//!
4//! A Rust port of the Python langdetect library - <https://github.com/Mimino666/langdetect>, which is itself a port of the Java language-detection library.
5//!
6//! This crate provides automatic language identification using n-gram based text categorization.
7//! It supports 55 languages out of the box and allows for custom language profile loading.
8//!
9//! ## Features
10//!
11//! - **55 built-in languages** with prepared profiles (copied from Python library version)
12//! - **High accuracy** for texts longer than 20-50 characters according to original presentation (49 languages with 99.8% precision): <https://www.slideshare.net/slideshow/language-detection-library-for-java/6014274>
13//! - **Non-deterministic algorithm** with optional seeding for reproducibility
14//! - **Extensible** - add custom language profiles
15//!
16//! ## Quick Start
17//!
18//! ```rust
19//! use langdetect_rs::detector_factory::DetectorFactory;
20//!
21//! let factory = DetectorFactory::default().build();
22//! match factory.detect("Hello world! My name is Dima and I am a developer", None) {
23//!     Ok(lang) => println!("Detected language: {}", lang),
24//!     Err(e) => println!("Detection error: {:?}", e),
25//! }
26//! ```
27//!
28//! ## Algorithm Overview
29//!
30//! The library uses a Bayesian approach with n-gram (1-3 character sequences) frequency analysis.
31//! It employs an iterative expectation-maximization algorithm to estimate language probabilities.
32//!
33//! ## Modules
34//!
35//! - [`detector_factory`] - Factory with languages profiles for creating detectors
36//! - [`detector`] - Core language detection logic
37//! - [`language`] - Language probability data structure
38//! - [`utils`] - Utility modules for profiles, n-grams, and Unicode handling
39pub mod detector;
40pub mod detector_factory;
41pub mod language;
42pub mod utils;