nlpo3 1.3.0

Thai natural language processing library, with Python and Node bindings
Documentation

nlpO3

Thai Natural Language Processing library in Rust, with Python and Node bindings. Formerly oxidized-thainlp.

Features

  • Thai word tokenizer
    • use maximal-matching dictionary-based tokenization algorithm and honor Thai Character Cluster boundaries
      • 2x faster than similar pure Python implementation (PyThaiNLP's newmm)
    • support custom dictionary
    • default dictionary included (62,000 words, a copy from PyThaiNLP)

Usage

Command line interface

echo "ฉันกินข้าว" | nlpo3 segment

Bindings

from nlpo3 import segment

segment("สวัสดีครับ")

As Rust library

In Cargo.toml:

[dependencies]
# ...
nlpo3 = "1.2.0"

Build

Requirements

Steps

Generic test:

cargo test

Build API document and open it to check:

cargo doc --open

Build (remove --release to keep debug information):

cargo build --release

Check target/ for build artifacts.

Issues

Please report issues at https://github.com/PyThaiNLP/nlpo3/issues