english
english is a blazing fast English morphology library written in Rust with zero external dependencies and a total code+data size under 1 MB. It provides extremely accurate verb conjugation and noun/adjective declension based on highly processed Wiktionary data, making it ideal for real-time procedural text generation.
⚡ Speed and Accuracy
Evaluation of the English inflector (extractor/main.rs/check_*) and performance benchmarking (examples/speedmark.rs) shows:
| Part of Speech | Correct / Total | Accuracy | Throughput (calls/sec) | Time per Call |
|---|---|---|---|---|
| Nouns | 238106 / 238549 | 99.81% | 3,929,101 | 254 ns |
| Verbs | 158056 / 161643 | 97.78% | 5,572,956 | 180 ns |
| Adjectives | 119200 / 119356 | 99.86% | 7,167,281 | 139 ns |
Note: Benchmarking was done under a worst-case scenario; typical real-world usage is 50-100 ns faster.
📦 Installation
Add to your Cargo.toml:
[]
= "0.0.9"
Then in your code:
use *;
🔧 Crate Overview
english
The public API for verb conjugation and noun/adjective declension.
- Combines optimized data generated from
extractorwith inflection logic fromenglish-core - Pure Rust, no external dependencies
- Fast Binary search over pre-sorted arrays:
O(log n)lookup. - Code generation ensures no runtime penalty.
english-core
The core engine for English inflection — pure algorithmic logic.
- Implements the core rules for conjugation/declension
- Used to classify forms as regular or irregular for the extractor
- Has no data dependency — logic-only
- Can be used stand alone for an even smaller footprint (at the cost of some accuracy)
extractor
A tool to process and refine Wiktionary data.
- Parses large English Wiktionary dumps
- Extracts all verb, noun, and adjective forms
- Uses
english-coreto filter out regular forms, preserving only irregulars - Generates sorted static arrays for use in
english
📦 Obtaining Wiktionary Data & Running the Extractor
This project relies on raw data extracted from Wiktionary. Current version built with data from 8/17/2025.
Steps
- Download the raw Wiktextract JSONL dump (~20 GB) from Kaikki.org.
- Place the file somewhere accessible (e.g.
../rawwiki.jsonl). - From the
extractorfolder, run:cargo run --release ../rawwiki.jsonl - Move the generated files adj_array.rs, noun_array.rs, verb_array.rs into the /src of english
Benchmarks
Performance benchmarks were run on my M2 Macbook.
Writing benchmarks and tests for such a project is rather difficult and requires opinionated decisions. Many words may have alternative inflections, and the data in wiktionary is not perfect. Many words might be both countable and uncountable, the tagging of words may be inconsistent. This library includes a few uncountable words in its dataset, but not all. Uncountable words require special handling anyway. Take all benchmarks with a pound of salt, write your own tests for your own usecases. Any suggestions to improve the benchmarking are highly appreciated.
Disclaimer
Wiktionary data is often unstable and subject to weird changes. This means that the provided inflections may change unexpectedly. You can look at the diffs of *_array.rs files for a source of truth.
Inspirations
📄 License
-
Code: Dual licensed under MIT and Apache © 2024 gold-silver-copper
-
Data: Wiktionary content is dual-licensed under