libdawg
A fast, memory-efficient DAWG (Directed Acyclic Word Graph) library for Rust.
A DAWG is a minimal acyclic finite-state automaton — essentially a trie with shared suffixes — providing compact dictionary storage and O(word length) lookups. Based on Daciuk et al. (2000).
Terminology
The API uses the terms "word" and "character" but is not limited to text. A
"character" is any type that implements DawgChar (Copy + Eq + Ord + Hash + Debug + Default) — for example char, u8, u16, or a custom enum. A "word"
is simply a sequence of such characters. This means the DAWG can store any set
of sorted sequences, not just strings.
Features
- Generic over character type — works with
char,u8,u16, or any type implementingDawgChar - Compact — suffix sharing minimizes memory usage
- Fast — O(word length) lookups with arena-allocated nodes
- Mutable —
OwnedDawgsupports adding and removing words after construction - Thread-safe —
DawgNodeuses only immutable arena references
Two APIs
libdawg provides two ways to build a DAWG:
OwnedDawg |
Arena-based (build_dawg) |
|
|---|---|---|
| Allocation | Internal | Caller-managed Arena |
| Mutation | add_word / remove_word |
Read-only after construction |
| Input order | Sorted (for initial build) | Sorted |
| Use when | You need to modify the DAWG after building, or want a simpler API | You want explicit lifetime control or only need a read-only DAWG |
Both produce a minimal DAWG with full suffix sharing. OwnedDawg preserves
minimality across mutations using clone-on-write path updates and a free-list
for arena slot reuse.
Usage
Add to your Cargo.toml:
[]
= "1"
The default arena feature re-exports typed_arena::Arena as libdawg::dawg::Arena.
If you only need OwnedDawg, you can disable it to drop the typed-arena dependency:
[]
= { = "1", = false }
Owned DAWG (no arena management)
The simplest API uses OwnedDawg, which manages allocation internally:
use build_owned_dawg;
let dawg = build_owned_dawg.unwrap;
let root = dawg.root;
let is_word = ;
assert!;
assert!;
Adding and removing words
OwnedDawg supports mutation after construction. Words can be added in any
order (unlike the initial sorted build), and the DAWG stays minimal:
use build_owned_dawg;
let mut dawg = build_owned_dawg.unwrap;
// Add words (returns true if the word was new)
assert!;
assert!; // already present
assert!;
assert!;
// Remove words (returns true if the word existed)
assert!;
assert!; // already removed
assert!;
assert!;
Removed nodes are placed on a free-list and reused by future add_word calls
before allocating from the arena.
Arena-based DAWG (read-only)
For explicit control over allocation, pass your own arena. Words must be added in lexicographic (sorted) order:
use build_dawg;
use Arena;
let arena = new;
let root = build_dawg.unwrap;
// Check word containment by traversing from the root
let is_word = ;
assert!;
assert!;
assert!;
Building from a file
use build_dawg_from_file;
use Arena;
let arena = new;
let root = build_dawg_from_file.unwrap;
let is_word = ;
println!;
The file should have one word per line, sorted alphabetically. Lines starting
with # are treated as comments.
Generic usage (non-char types)
The DAWG is generic over the edge label type — build_dawg accepts any
iterator of words where each word implements IntoWord<C>:
use build_dawg;
use Arena;
let arena = new;
let words: = vec!;
let root = build_dawg.unwrap;
let contains = ;
assert!;
assert!;
Incremental construction
For more control, use Builder directly:
use Builder;
use Arena;
let arena = new;
let mut builder = new;
// Words must be added in sorted order
builder.add_word.unwrap;
builder.add_word.unwrap;
builder.add_word.unwrap;
builder.add_word.unwrap;
let root = builder.build;
let is_word = ;
assert!;
Walking the graph
You can traverse the DAWG directly via DawgNode:
use build_dawg;
use Arena;
let arena = new;
let root = build_dawg.unwrap;
// Two starting letters: 'B' and 'C'
assert_eq!;
let b_node = root.get.unwrap;
let a_node = b_node.get.unwrap;
let k_node = a_node.get.unwrap;
let e_node = k_node.get.unwrap;
// 'BAKE' is a word, and has child 'D' for 'BAKED'
assert!;
assert_eq!;
How it works
The DAWG is constructed in a single pass over sorted input. As words are added, the builder identifies shared suffixes and deduplicates nodes, producing a minimal graph. All nodes are arena-allocated, so the entire graph is freed at once when the arena is dropped.
Mutation (OwnedDawg only): Because nodes are deduplicated by structure,
modifying a shared node in-place would corrupt other words. Instead, add_word
and remove_word clone the path from the affected node to the root and
re-canonicalize each clone against the register. Reference counting tracks when
nodes become unreachable; freed slots go to a free-list and are reused by future
allocations.
License
MIT