rustling 0.5.0

A blazingly fast library for computational linguistics
Documentation

Rustling

PyPI crates.io

Rustling is a blazingly fast library for computational linguistics. It is written in Rust, with Python bindings.

Documentation: Python | Rust

Features

  • Language Models — N-gram language models with smoothing

    • MLE — Maximum Likelihood Estimation (no smoothing)
    • Lidstone — Lidstone (additive) smoothing
    • Laplace — Laplace (add-one) smoothing
  • Word Segmentation — Models for segmenting unsegmented text into words

    • LongestStringMatching — Greedy left-to-right longest match segmenter
    • RandomSegmenter — Random baseline segmenter
  • Part-of-speech Tagging

    • AveragedPerceptronTagger - Averaged perceptron tagger
  • CHAT Parsing — Parser for CHAT transcription files (CHILDES/TalkBank)

    • CHAT — Read and query CHAT data from directories, files, strings, or ZIP archives

Performance

Benchmarked against pure Python implementations from NLTK, wordseg (v0.0.5), and pylangacq (v0.19.1). See benchmarks/ for full details and reproduction scripts.

Component Task Speedup vs.
Language Models Fit 10x NLTK
Score 2x NLTK
Generate 80–112x NLTK
Word Segmentation LongestStringMatching 9x wordseg
RandomSegmenter 1.1x wordseg
POS Tagging Training 5x NLTK
Tagging 7x NLTK
CHAT Parsing from_dir 55x pylangacq
from_zip 48x pylangacq
from_files 63x pylangacq
from_strs 116x pylangacq
words() 3x pylangacq
utterances() 15x pylangacq

Installation

Python

pip install rustling

Rust

cargo add rustling

License

MIT License