Langram - the most accurate language detection library

317 ScriptLanguages (187 models + 130 single language scripts)

One language can be written in multiple scripts, so it will be detected as a different ScriptLanguage (language + script)

Uses alphabet_detector as a word separator + language prefilter.

Based on chars (1 - 5) and 1 word n-gram language model modified algorithm.

RAM requirements are low, but it may take up to the provided models binary file's size, but this memory is shared (Virtual space, Mmap), so it's not required to have that amount of RAM available. But if it won't be able to cache the whole models file in RAM, it's speed will be affected.

This library is a complete rewrite of Lingua: much faster, more accuracy, more languages, etc.

Accuracy report

Comparison with other language detectors

Setup

To use this library, you need a binary models file, which must be placed near the executable, or set LANGRAM_MODELS_PATH.

It can be:

Downloaded from langram_models releases;
Built (recommened if big-endian target) langram_models.

Which is more advanced and allows you to remove model ngrams, and recompile, so that models binary would be lighter.

langram 0.9.1

Langram - the most accurate language detection library

317 ScriptLanguages (187 models + 130 single language scripts)

Setup