Rustling is a blazingly fast library for computational linguistics.
Features
- N-grams
- Language models
- Hidden Markov model
- Word segmentation
- Part-of-speech tagging
- CHAT parsing for TalkBank and CHILDES data
Performance
| Component | Task | Speedup | vs. |
|---|---|---|---|
| Language Models | Fit | 11x | NLTK |
| Score | 2x | NLTK | |
| Generate | 86--107x | NLTK | |
| Word Segmentation | LongestStringMatching | 9x | wordseg |
| POS Tagging | Training | 5x | NLTK |
| Tagging | 17x | NLTK | |
| HMM | Fit | 14x | hmmlearn |
| Predict | 0.9x | hmmlearn | |
| Score | 5x | hmmlearn | |
| CHAT Parsing | Reading from a ZIP archive | 30x | pylangacq |
| Reading from strings | 35x | pylangacq | |
| Parsing utterances | 15x | pylangacq | |
| Parsing tokens | 8x | pylangacq |
See benchmarks/ for reproduction scripts.
Installation
Python
Using pip:
Using conda:
For Pyodide, pre-built WASM wheels (with multithreading disabled, as Pyodide does not support it)
are available from each GitHub release
— look for the .whl file with emscripten in the filename.
Rust
License
MIT License