Whatlang
Natural language detection in Rust.
Features
- Supports 70 languages
- 100% written in Rust
- No external dependencies
- Super fast
- Recognizes not only a language, but also a script (Latin, Cyrillic, etc)
Get started
The library is still in active development. Here is the short example how to use it:
Add to you Cargo.toml
:
[dependencies]
whatlang = "*"
In you program:
extern crate whatlang;
use ;
Blacklist
Your can blacklist undesired languages, passing a vector. In the example blow English and Spanish will be ignored:
let list = ;
let query = new.blacklist;
Whitelist
In similar way, you can whitelist specified languages.
In this example, the library will recognize only Esperanto and Russian.
Note, if it detects a script that is different from Latin(Esperanto)
or Cyrillic(Russian), e.g. Greek, it will return None
.
let list = ;
let query = new.whitelist;
Roadmap
- Support 100 most popular languages
Allow to specify blacklist for QueryAllow to specify whitelist for Query- Support new API
- Write doc for public structures and functions
- Improve README example
- Tune performance
- Create demo application
- Provide some metrics about reliability in
Result
struct
Supported languages
Language | ISO 639-3 | Enum |
---|---|---|
Esperanto | epo | Lang::Epo |
English | eng | Lang::Eng |
Russian | rus | Lang::Rus |
Mandarin | cmn | Lang::Cmn |
Spanish | spa | Lang::Spa |
Portuguese | por | Lang::Por |
Italian | ita | Lang::Ita |
Bengali | ben | Lang::Ben |
French | fra | Lang::Fra |
German | deu | Lang::Deu |
Ukrainian | ukr | Lang::Ukr |
Georgian | kat | Lang::Kat |
Arabic | arb | Lang::Arb |
Hindi | hin | Lang::Hin |
Japanese | jpn | Lang::Jpn |
Hebrew | heb | Lang::Heb |
Yiddish | ydd | Lang::Ydd |
Polish | pol | Lang::Pol |
Amharic | amh | Lang::Amh |
Tigrinya | tir | Lang::Tir |
Javanese | jav | Lang::Jav |
Korean | kor | Lang::Kor |
Bokmal | nob | Lang::Nob |
Nynorsk | nno | Lang::Nno |
Danish | dan | Lang::Dan |
Swedish | swe | Lang::Swe |
Finnish | fin | Lang::Fin |
Turkish | tur | Lang::Tur |
Dutch | nld | Lang::Nld |
Hungarian | hun | Lang::Hun |
Czech | ces | Lang::Ces |
Greek | ell | Lang::Ell |
Bulgarian | bul | Lang::Bul |
Belarusian | bel | Lang::Bel |
Marathi | mar | Lang::Mar |
Kannada | kan | Lang::Kan |
Romanian | ron | Lang::Ron |
Slovene | slv | Lang::Slv |
Croatian | hrv | Lang::Hrv |
Serbian | srp | Lang::Srp |
Macedonian | mkd | Lang::Mkd |
Lithuanian | lit | Lang::Lit |
Latvian | lav | Lang::Lav |
Estonian | est | Lang::Est |
Tamil | tam | Lang::Tam |
Vietnamese | vie | Lang::Vie |
Urdu | urd | Lang::Urd |
Thai | tha | Lang::Tha |
Gujarati | guj | Lang::Guj |
Uzbek | uzb | Lang::Uzb |
Punjabi | pan | Lang::Pan |
Azerbaijani | azj | Lang::Azj |
Indonesian | ind | Lang::Ind |
Telugu | tel | Lang::Tel |
Persian | pes | Lang::Pes |
Malayalam | mal | Lang::Mal |
Hausa | hau | Lang::Hau |
Oriya | ori | Lang::Ori |
Burmese | mya | Lang::Mya |
Bhojpuri | bho | Lang::Bho |
Tagalog | tgl | Lang::Tgl |
Yoruba | yor | Lang::Yor |
Maithili | mai | Lang::Mai |
Oromo | orm | Lang::Orm |
Igbo | ibo | Lang::Ibo |
Cebuano | ceb | Lang::Ceb |
Kurdish | kur | Lang::Kur |
Malagasy | mlg | Lang::Mlg |
Saraiki | skr | Lang::Skr |
Missing languages
The language that I did not find trigrams for:
License
MIT
Acknowledgments
- Thanks Franc JS for trigrams dataset.