lexrs — core library
The lexrs Rust crate implements two lexicon data structures — Trie and DAWG — with an optional Python extension module built via PyO3.
Crate layout
lexrs/
src/
lib.rs — public re-exports and Python module registration
trie.rs — Trie: arena-allocated prefix tree, wildcard & Levenshtein search
dawg.rs — DAWG: minimized Trie with suffix compression
node.rs — Node type shared by both structures
utils.rs — file I/O helpers and wildcard pattern normalization
error.rs — LexError enum (used for out-of-order DAWG insertion, etc.)
python.rs — PyO3 wrappers (compiled only with --features python)
bin/
genome_bench.rs — standalone benchmark binary
tests/
trie_tests.rs — Rust unit tests for Trie
dawg_tests.rs — Rust unit tests for DAWG
Data structures
Trie
Arena-allocated prefix tree (Vec<Node>). Words may be inserted in any order. Good for write-heavy workloads or when insertion order is unknown.
DAWG (Directed Acyclic Word Graph)
A minimized Trie that merges identical suffix subtrees, yielding a much smaller node count for large lexicons. Words must be inserted in lexicographic order. Call reduce() after individual add() calls to finalize minimization; add_all() handles this automatically.
API
Both types expose the same interface:
| Method | Description |
|---|---|
add(word, count) |
Insert a word with an optional frequency count |
add_all(words) |
Insert from any iterable (sorts before inserting into DAWG) |
add_from_file(path) |
Insert words from a newline-delimited file |
contains(word) |
Exact membership test |
contains_prefix(prefix) |
True if any stored word starts with this prefix |
search(pattern) |
Wildcard search (* = any chars, ? = exactly one) |
search_with_count(pattern) |
Like search, returns (word, count) pairs |
search_with_prefix(prefix) |
All words beginning with the prefix |
search_with_prefix_count(prefix) |
Like above, with counts |
search_within_distance(word, dist) |
Levenshtein fuzzy search |
search_within_distance_count(word, dist) |
Like above, with counts |
word_count() |
Number of words stored |
node_count() |
Number of nodes in the structure |
Consecutive wildcards are normalized: ** → *, ?* → *.
Install
Add to your Cargo.toml:
[]
= "0.1"
Python bindings
The same data structures are available as a Python package:
=
=
in # True
# wildcard
# fuzzy
=
in
Cargo features
| Feature | Effect |
|---|---|
python |
Enables PyO3 and compiles the lexrs Python extension module |