docs.rs failed to build libdictenstein-0.2.0
Please check the build logs for more information.
See Builds for ideas on how to fix a failed build, or Metadata for how to configure docs.rs builds.
If you believe this is docs.rs' fault, open an issue.

libdictenstein

A toolbox of high-performance dictionary data structures for Rust — tries, DAWGs, double-array tries, suffix automata, compact suffix graphs, and lock-free durable Adaptive Radix Tries — unified behind one small trait API, and backed by machine-checked proofs.

What is libdictenstein?

libdictenstein provides the container half of approximate string matching: efficient, traversable collections of terms (a dictionary), optionally mapping each term to a value. Every backend exposes a uniform Dictionary interface — contains, root, and node-by-node transition — so you can swap implementations without touching call sites.

It is the companion to liblevenshtein, which supplies the query half: a Levenshtein-automaton transducer that walks any Dictionary to find all terms within an edit distance. libdictenstein itself contains no fuzzy-matching code — it focuses on being the fastest, most correct set of dictionaries that transducer can traverse.

Terminology. A term is a string in the dictionary. A prefix is a leading substring ("ca" of "cat"); a substring occurs anywhere ("at" in "cat"). Σ (sigma) is the alphabet — the set of symbols an edge can carry (256 byte values, or all Unicode scalar values). A symbol is a unit; fanout is a node's child count; ∣x∣ is the length of x.

Highlights

20+ backend variants spanning the time/space/durability frontier — pick by need, not by lock-in.
Three alphabets, one code path — byte (u8), Unicode (char), and 64-bit token (u64) units via the CharUnit abstraction.
A lock-free, crash-durable persistent ARTrie family — disk-backed (mmap or io_uring), write-ahead-logged, with native byte, Unicode, vocabulary, and u64 sequence profiles.
Native persistent substring indexes — suffix automaton, suffix-tree-compatible, and SCDAWG variants for byte and Unicode corpora.
Set algebra over dictionaries — union / intersection / difference / prefix zippers compose any two backends lazily.
Formally verified core — 67 Rocq files (0 axioms, 0 admits), 52 TLA⁺ models, and a CI-gated unsafe contract inventory (see Formal verification).

Backend selector

Backend	Best for	Updates	Unicode	Lookup
DoubleArrayTrie / `…Char`	general use; fastest lookup, most compact	insert-only (rebuild)	`u8` / `char`	`O(∣q∣)`
DynamicDawg / `…Char` / `…U64`	runtime insert and remove; suffix sharing	insert + remove	`u8` / `char` / `u64`	`O(∣q∣)`
SuffixAutomaton / `…Char`	substring search (match anywhere)	insert + remove	`u8` / `char`	`O(∣q∣)`
Scdawg / `…Char`	static, compact bidirectional substring index	build-once	`u8` / `char`	`O(∣q∣)`
PathMapDictionary / `…Char` (feat. `pathmap-backend`)	shared-structure mutable trie	insert + remove	`u8` / `char`	`O(∣q∣)`
PersistentARTrie / `…Char` (feat. `persistent-artrie`)	disk-backed, crash-durable key→value	insert + remove	`u8` / `char`	`O(∣q∣)` + I/O
PersistentARTrieU64Compact (feat. `persistent-artrie`)	durable `u64` sequence/time-series key→value	insert + remove	`u64`	`O(∣q∣)` + I/O
PersistentSuffixAutomaton / `…Char` (feat. `persistent-artrie`)	disk-backed substring search	insert + remove	`u8` / `char`	`O(∣q∣)` + I/O
PersistentSuffixTree / `…Char` (feat. `persistent-artrie`)	disk-backed suffix-tree-compatible substring API	insert + remove	`u8` / `char`	`O(∣q∣)` + I/O
PersistentScdawg / `…Char` (feat. `persistent-artrie`)	disk-backed compact bidirectional substring index	insert + remove	`u8` / `char`	`O(∣q∣)` + I/O
PersistentVocabARTrie (feat. `persistent-artrie`)	durable term ↔ u64 vocabulary (bijection)	insert	`char`	`O(∣q∣)`

∣q∣ is the query length; lookup cost is independent of the number of stored terms n for every backend — the defining property of trie-shaped indexes. The factory (below) constructs all 11 in-memory backends from one call; disk-backed variants take a file path because creation, recovery, and checkpointing are part of the API.

Architecture at a glance

The design is a thin trait layer over interchangeable backend families. Backends implement only the traits they can honor (a read-only static trie implements Dictionary; a DAWG also implements MutableDictionary and CompactableDictionary). The unit abstraction (CharUnit for edge labels, KeyEncoding for persistent keys) lets shared implementations serve u8, char, and u64 alphabets. The factory and prelude are the ergonomic entry points.

Quick start

[dependencies]
libdictenstein = "0.1"

Build, query, traverse (any in-memory backend):

use libdictenstein::prelude::*;            // Dictionary, DictionaryNode, Mutable*, …
use libdictenstein::double_array_trie::DoubleArrayTrie;

let dict = DoubleArrayTrie::from_terms(vec!["hello", "help", "world"]);

assert!(dict.contains("hello"));
assert!(!dict.contains("hel"));            // "hel" is a prefix, not a term

// Walk the automaton edge-by-edge (this is what a fuzzy transducer does):
let root = dict.root();
if let Some(next) = root.transition(b'h') {
    assert!(next.transition(b'e').is_some());
}

Associate values with terms (a mapped dictionary):

use libdictenstein::prelude::*;
use libdictenstein::dynamic_dawg::DynamicDawg;

// A byte-level DAWG mapping each term to a u64 value (the value type defaults to `()`).
let counts: DynamicDawg<u64> = DynamicDawg::new();
counts.insert_with_value("apple", 3);
counts.insert_with_value("apricot", 1);
assert_eq!(counts.get_value("apple"), Some(3));

Persist to disk, durably (feature persistent-artrie):

use libdictenstein::persistent_artrie::char::PersistentARTrieChar;

// Create + populate; values are u64 here.
let trie = PersistentARTrieChar::<u64>::create("words.artc")?;
for i in 0..1000 {
    trie.upsert(&format!("term{i:05}"), i as u64)?;   // durably logged, then published
}
trie.checkpoint()?;                                    // fold the overlay into a dense on-disk image

// Reopen in a later process — state survives a crash up to the last durable write.
let reopened = PersistentARTrieChar::<u64>::open("words.artc")?;
assert_eq!(reopened.get("term00500"), Some(500));
# Ok::<(), Box<dyn std::error::Error>>(())

Persist native u64 sequence keys:

use libdictenstein::persistent_artrie::PersistentARTrieU64Compact;

// Prefix-4 CX profile: the default compact native-u64 representation.
let series = PersistentARTrieU64Compact::<u64>::create("series.ar64")?;
series.insert_sequence_with_value(
    &[0x1000_0000_0000_002a, 0x3000_0000_0000_0100, f64::to_bits(42.5)],
    f64::to_bits(42.5),
);
series.checkpoint()?;

let reopened = PersistentARTrieU64Compact::<u64>::open("series.ar64")?;
assert!(reopened.contains_sequence(
    &[0x1000_0000_0000_002a, 0x3000_0000_0000_0100, f64::to_bits(42.5)]
));
# Ok::<(), Box<dyn std::error::Error>>(())

Persist substring indexes:

use libdictenstein::persistent_artrie::PersistentScdawgChar;

let index = PersistentScdawgChar::<u64>::create("docs.pscdawg")?;
index.insert_with_value("the quick brown fox", 7);
index.checkpoint()?;

let reopened = PersistentScdawgChar::<u64>::open("docs.pscdawg")?;
assert!(reopened.contains_substring("quick"));
assert_eq!(reopened.locations("brown"), vec![("the quick brown fox".to_string(), 10)]);
# Ok::<(), Box<dyn std::error::Error>>(())

Core traits

The public surface is small and layered — implement only what a backend can support.

Trait	Adds	Honored by
`Dictionary`	`contains`, `root`, `len`, `is_empty`	every backend
`DictionaryNode`	`transition(unit)`, `edges`, `is_final`	every node type
`MappedDictionary`	`get_value(term) → Option<V>`	valued backends
`MutableDictionary`	`insert`, `remove`, `extend`	DAWG, PathMap, SuffixAutomaton, persistent
`CompactableDictionary`	`compact`, `minimize`	DAWG family
`MutableMappedDictionary`	`insert_with_value`, `union_with`, `update_or_insert`	valued + mutable
`BijectiveDictionary`	reverse lookup `value → term`	`BijectiveMap`, vocab tries

Two unit abstractions let one implementation serve every alphabet:

CharUnit — the edge-label type: u8 (bytes; ASCII/Latin-1, smallest), char (Unicode scalar values; correct character-level semantics), or u64 (token / time-series labels).
KeyEncoding — the persistent-trie key model: ByteKey (u8 spans), CharKey (u32 codepoint spans), or U64Key (native 64-bit sequence units).

Values must implement DictionaryValue (Clone + Send + Sync + 'static); the blanket impls cover (), the integer types, String, Vec<T>, HashSet<T>, and SmallVec. () makes a set (membership only); any other V makes a map.

Construct any in-memory backend uniformly with DictionaryFactory, or pull the common names from the prelude.

Algorithms (the interesting part)

Every backend answers a lookup in O(∣q∣) — but how they store terms, and what else they can do, differs sharply. Below, the load-bearing ideas in literate form. Deep dives live under docs/algorithms/ and docs/theory/.

Double-Array Trie (DAT) — cache-resident static lookup

A double-array trie packs a trie into two parallel integer arrays, BASE and CHECK. The child of state s on unit u is found arithmetically — no pointer chase:

t = BASE[s] + offset(u)
child exists  ⟺  CHECK[t] == s          # CHECK validates the parent that placed t

Because BASE[s] and the candidate slot sit adjacently, a transition is typically a single cache line touch — empirically ~3× faster than pointer-following structures, at ~8 bytes/state. The cost: it is insert-only (built from a sorted term list); use a DAWG when you need remove. — Aoe (1989); Yata et al. (2007).

DAWG minimization via signature hashing

A DAWG (Directed Acyclic Word Graph) is a trie that also merges identical suffixes: the "-tion" shared by a million English words is stored once.

Two nodes are mergeable iff they have the same right language

Rᵤ = { strings spelled out on paths from u to any final state }.

Comparing right-languages directly is expensive, so libdictenstein hashes them. Each node folds a 64-bit FxHash of its is_final flag and its sorted edges, bottom-up:

signature(u) = FxHash( u.is_final, sort[ (label, signature(child)) : edge (label→child) of u ] )

# during incremental construction, after building the freshly-inserted path:
for each new node u (deepest first):
    if ∃ live node v with signature(v) == signature(u):
        if structurally_equal(u, v):      # guard against a 64-bit hash collision
            redirect u's parent edge to v # merge: u is discarded

Sorting the edges makes the signature independent of insertion order; the structural re-check defeats the birthday-paradox collision of a 64-bit hash. This replaces a recursive Box<Signature> (≈ 3000 allocations for a 1000-node graph) with a single allocation-free u64 comparison. — Blumer et al. (1987); Daciuk (2000).

Suffix automaton — substring search in `O(∣q∣)`

To match a pattern anywhere in a text (not just at a word boundary), build the minimal DFA recognizing every substring. Its states are equivalence classes of substrings sharing the same set of end-positions (endpos). It is built online — one character at a time, amortized O(1) each:

extend(c):                              # append character c to the indexed text
    cur ← new state with len = last.len + 1
    p ← last
    while p ≠ ⊥ and p has no c-edge:    # walk the suffix-link chain, adding the new edge
        add edge  p --c--> cur ;  p ← suffix_link(p)
    if p = ⊥:
        suffix_link(cur) ← root
    else:
        q ← target of p's c-edge
        if len(q) = len(p) + 1:
            suffix_link(cur) ← q
        else:                            # endpos classes diverge — split q (the only costly step)
            clone q as q′ ; rewire the affected c-edges and suffix links to q′
    last ← cur

The automaton provably has ≤ 2·∣T∣ − 1 states and ≤ 3·∣T∣ − 4 transitions for text T, giving O(∣T∣) construction and O(∣q∣) substring queries. The Scdawg backend is a symmetric compact refinement that adds left-extension edges for bidirectional search at ~20–30% fewer states. — Blumer et al. (1985); Crochemore (1986); Inenaga et al. (2001, 2005).

Complexity & memory at a glance

Let ∣q∣ = query length, N = number of terms, n = total indexed size in units. Every backend answers membership in O(∣q∣), independent of N — they diverge on updates and footprint (bytes/state are approximate, order-of-magnitude):

Backend	Lookup	Insert	Remove	~Bytes/state	Construction
DoubleArrayTrie	`O(∣q∣)`	append-only	—	~8	`O(N log N)` (sort + pack)
DynamicDawg	`O(∣q∣)`	`O(∣q∣ log n)`	`O(∣q∣ log n)`	~25–32	incremental + minimize
SuffixAutomaton	`O(∣q∣)` (substring)	`O(1)` amortized	`O(n)` rebuild	~40–50	`O(n)` online
Scdawg	`O(∣q∣)` (substring)	build-once	—	~30–40	`O(n)`
PersistentARTrie	`O(∣q∣)` + I/O	`O(∣q∣)` + I/O	`O(∣q∣)` + I/O	~30–50 + disk	incremental
PersistentSuffix*	`O(∣q∣)` + I/O (substring)	rebuild+publish	rebuild+publish	graph + disk	copy-on-write native graph

The double-array trie's O(N log N) build buys the cheapest, most cache-resident lookup; the DAWG family trades a log n insert factor for runtime mutation and suffix sharing; the suffix automaton spends ~2× the memory to answer substring (not just prefix) queries; the persistent ARTrie family adds crash recovery and checkpointable disk images, while the persistent suffix family stores native substring graphs behind snapshot-style readers.

Persistent ARTrie — lock-free & durable

The flagship backend family is a disk-backed Adaptive Radix Trie (ARTrie): a radix tree whose nodes adapt their representation to their fanout, paired with a write-ahead log (WAL) for crash durability and a lock-free (atomic compare-and-swap, CAS) in-memory overlay for concurrency.

Adaptive nodes + SIMD

A radix-tree node's fanout ranges from 1 to ∣Σ∣. A fixed dense array wastes most of its slots on sparse nodes; a sorted list is slow on dense ones. The persistent overlay uses an immutable adaptive edge store: byte labels get the classic ART tiers, while char and u64 labels stay native and choose between inline, sorted, and sparse-indexed storage by fanout.

For byte keys, the hot path is the familiar Node4/16/48/256 ART ladder:

find_child(node, byte):
    match node.kind:
        Node4   →  linear scan keys[0..4]            # ≤4 children; fits one cache line
        Node16  →  m   = cmpeq( splat(byte), keys )  # SSE: 16 byte-compares in ONE instruction
                   bit = movemask(m) & ((1<<n) − 1)  #   → 16-bit match mask
                   return bit ≠ 0 ? children[tzcnt(bit)] : ∅   # trailing-zero count = slot
        Node48  →  slot = index[byte] ; children[slot]   # 256-byte index → 48 pointer slots
        Node256 →  children[byte]                         # direct array; dense nodes

Per-node lookup is bounded by the adaptive representation; the Node16 byte path turns ≤ 16 scalar comparisons into one SIMD (single-instruction, multiple-data) _mm_cmpeq_epi8 instruction. For non-byte labels, native u32/u64 labels avoid byte expansion and use sorted or indexed lookup once fanout outgrows inline storage. — Leis et al. (2013).

Path compression

Storing "metamorphosis" as 14 single-child nodes wastes space and, on disk, 14 page faults. ART collapses single-child chains, keeping a per-node partial prefix:

check_prefix(node, key, depth):
    for i in 0 .. min(node.partial_len, 8):          # "pessimistic": compare stored bytes inline
        if key[depth + i] ≠ node.partial[i]:
            return Mismatch(i)
    # partial_len > 8 ("optimistic"): the prefix was truncated; verify fully at the leaf

Tree height drops from O(∣key∣) to O(∣key∣ / s̄) for mean compressed span s̄, a ~2–4× reduction (hence ~2–4× fewer I/Os) on natural-language keys. — Morrison (1968).

Persistent variants

PersistentARTrie / PersistentARTrieChar use ByteKey / CharKey, shared overlay nodes, WAL records, and CX checkpoint images.
PersistentARTrieU64Compact is the default native u64 sequence-key profile. It stores one u64 edge per transition, writes shared WAL records, and checkpoints through the CX compressor with a prefix-4 budget. PersistentARTrieU64Prefix3Compat opens or benchmarks prefix-3 CX images explicitly.
PersistentVocabARTrie is a durable term -> u64 and u64 -> term vocabulary. Forward lookup walks the lock-free char overlay; reverse lookup uses an in-memory map rebuilt from checkpoint/WAL recovery.
PersistentSuffixAutomaton / PersistentSuffixTree / PersistentScdawg and their Char variants persist native substring graphs plus atomically published operation-segment WALs. Recovery also reads the historical monolithic WAL format. Reads traverse immutable snapshots; writes rebuild a candidate graph and publish the winning copy by CAS.

Durable writes: the Order-A protocol

The byte/char persistent ARTrie and vocabulary implementations are lock-free (readers and writers never block on a global mutation lock) yet crash-durable (an acknowledged write survives power loss). The reconciling invariant is acknowledged ⟹ durable, enforced by a strict, non-negotiable ordering (durable_write.rs):

durable_insert(term, value):                  # requires durability ∈ { Immediate, GroupCommit }
  ① lsn  ← WAL.append_durable( Insert{term, value} )   #  fsync FIRST — before any visibility
  ② publish (term, value) by CAS on the overlay root   #  the visibility point = linearization point
  ③ rk   ← WAL.append_durable( CommitRank{lsn} )        #  fixes the order of concurrent commits
     mark_committed(lsn) ; mark_committed(rk)           #  advance the committed-prefix watermark
  ④ ACK                                                 #  now durable AND visible

Why the order is sacred:

Log before publish (Order A). The opposite — publish-then-log (Order B) — can expose a write that is visible but not yet durable; a crash in that window loses an acknowledged write. Rejected.
One append per write. The single WAL record covers every CAS retry; re-appending on retry would burn log-sequence numbers (LSNs) and punch holes in the committed prefix.
The watermark, not the frontier. Under out-of-order lock-free commit, the only safe checkpoint LSN is the committed-prefix watermark (the largest L such that every LSN ≤ L is committed) — using the appended frontier instead would checkpoint an uncommitted write.

Freed nodes are reclaimed by epoch-based reclamation (EBR): memory is released only after every reader that could hold a pointer to it has departed its epoch — bounded-latency, lock-free, and free of use-after-free. The on-disk substrate is mmap by default, or io_uring + O_DIRECT (feature io-uring-backend, Linux ≥ 5.1) for batched async I/O. — Mohan et al. (1992) for the WAL/recovery discipline; Driscoll et al. (1989) for the copy-on-write structural sharing.

The native u64 profile follows the same log-before-publish rule with shared WAL records and lock-free root publication, but its compact CX snapshot format is intentionally not the old native bincode snapshot/WAL format. Historical bincode controls live only in git history.

The write path has no trie-wide mutation lock: publication is a CAS over immutable overlay nodes. Immediate crash durability still waits for WAL append/fsync, and the synchronous WAL writer serializes file writes; the lock-free claim is about overlay read/write publication, not about making storage latency disappear.

Feature flags

Feature	Effect	Notes
`default` = `["parking_lot"]`	`parking_lot::RwLock` for the dynamic backends	faster than `std`
`pathmap-backend`	the `PathMap*` backends	structural-sharing trie
`serialization`	`serde` + `bincode` + JSON (de)serialization
`compression`	gzip the serialized form (`flate2`)
`protobuf`	Protobuf (de)serialization (`prost`)
`persistent-artrie`	the disk-backed ARTrie, vocabulary, and native suffix graph families	`mmap` + WAL + CX/native snapshots
`io-uring-backend`	`io_uring` + `O_DIRECT` block storage	Linux ≥ 5.1
`parallel-merge`	multi-core merge via `rayon`
`group-commit`	batched WAL group commit	⚠️ experimental — measured ~1.5–2× regression on NVMe; intended only for slow storage. See `docs/persistence/group_commit_regression.md`
`lling-llang`	WFST semiring integration for the `Lattice` trait
`bench-internals`	expose internal APIs to benchmarks

Formal verification

The persistent ARTrie carries an unusually strong correctness budget. All figures below are checked into the repository and re-derivable from the sources under formal-verification/.

Tool	Scope	Status
Rocq (Coq)	functional correctness + refinement of the trie to an abstract map ADT	66 `.v` files, 1,283 propositions (theorem/lemma/corollary), 0 `Admitted`, 0 `Axiom`, 0 `Parameter` — fully constructive (every obligation closed by `Qed.`/`Defined.`)
TLA⁺ / TLC	concurrency & crash-recovery safety/liveness	52 specification modules — e.g. `LockFreeARTrieLinearizability`, `CrashRecovery`, `PersistentSuffixAutomaton`, `PublicDurabilityPolicy`; the composed model explores multi-million-state spaces (PART ≈ 4.2 M distinct states)
loom	exhaustive interleaving of the lock-free CAS paths	root-CAS, overlay value/index CAS, counter-merge, EBR
`unsafe` inventory	every `unsafe` site bound to a reviewed contract + coverage class	43 inventory rows / 31 contracts, CI-gated by `scripts/verify-unsafe-boundary-inventory.sh` (set-equality — no silent drift)

Headline properties proven or model-checked: linearizability of the lock-free root, no lost writes (the Order-A watermark discipline), crash-recovery completeness (durable-prefix replay), and no use-after-free under EBR. Out-of-scope boundaries (e.g. the upstream Levenshtein transducer, kernel io_uring internals, certified compilation) are stated explicitly in formal-verification/GAP_LEDGER.md.

Performance

Indicative figures for a 10,000-word English corpus (build once, then query) — relative behavior matters more than absolute times, which vary by CPU, allocator, and corpus. Run cargo bench for your platform; full result logs live under docs/benchmarks/.

                       DoubleArrayTrie     DynamicDawg
   construction              ~3.2 ms          ~7.2 ms
   exact-match lookup        ~6.6 µs          ~19.8 µs
   contains (cached)         ~0.22 µs         ~6.7 µs

The takeaway: a static double-array trie wins read-heavy in-memory workloads; a DAWG pays a constant factor for runtime insert/remove and suffix-sharing. Disk-backed throughput (mmap vs io_uring) is characterized in docs/io_uring_migration/benchmark_results.md.

Persistent u64 and native suffix-index benchmark ledgers live under docs/experiments/ and docs/benchmarks/. Notable dated performance and storage changes are recorded in CHANGELOG.md.

Documentation map

Topic	Where
Per-backend algorithm walkthroughs	`docs/algorithms/`
Theory: disk tries, ART, SCDAWG (with citations)	`docs/theory/`
Persistent-ARTrie mmap architecture	`docs/persistence/mmap-architecture.md`
Eviction design	`docs/eviction/`
Proof scope, results, gap ledger	`formal-verification/`
Diagram sources (PlantUML)	`docs/diagrams/`
Changelog	`CHANGELOG.md`

References

All DOIs below were verified against Crossref to resolve to the cited work.

Fredkin, E. (1960). Trie Memory. Communications of the ACM 3(9). 10.1145/367390.367400
Morrison, D. R. (1968). PATRICIA — Practical Algorithm To Retrieve Information Coded in Alphanumeric. Journal of the ACM 15(4). 10.1145/321479.321481
Blumer, A., et al. (1985). The Smallest Automaton Recognizing the Subwords of a Text. Theoretical Computer Science 40. 10.1016/0304-3975(85)90157-4
Crochemore, M. (1986). Transducers and Repetitions. Theoretical Computer Science 45(1). 10.1016/0304-3975(86)90041-1
Blumer, A., et al. (1987). Complete Inverted Files for Efficient Text Retrieval and Analysis. Journal of the ACM 34(3). 10.1145/28869.28873
Aoe, J. (1989). An Efficient Digital Search Algorithm by Using a Double-Array Structure. IEEE Transactions on Software Engineering 15(9). 10.1109/32.31365
Driscoll, J. R., et al. (1989). Making Data Structures Persistent. Journal of Computer and System Sciences 38(1). 10.1016/0022-0000(89)90034-2
Mohan, C., et al. (1992). ARIES: A Transaction Recovery Method Supporting Fine-Granularity Locking and Partial Rollbacks Using Write-Ahead Logging. ACM Transactions on Database Systems 17(1). 10.1145/128765.128770
Crochemore, M., & Vérin, R. (1997). Direct Construction of Compact Directed Acyclic Word Graphs. CPM, LNCS 1264. 10.1007/3-540-63220-4_55
Daciuk, J., et al. (2000). Incremental Construction of Minimal Acyclic Finite-State Automata. Computational Linguistics 26(1). 10.1162/089120100561601
Inenaga, S., et al. (2001). On-line Construction of Symmetric Compact Directed Acyclic Word Graphs. SPIRE. 10.1109/SPIRE.2001.989743
Schulz, K. U., & Mihov, S. (2002). Fast String Correction with Levenshtein Automata. International Journal on Document Analysis and Recognition 5(1). 10.1007/s10032-002-0082-8 — the basis of the companion liblevenshtein.
Inenaga, S., et al. (2005). On-line Construction of Compact Directed Acyclic Word Graphs. Discrete Applied Mathematics 146(2). 10.1016/j.dam.2004.04.012
Yata, S., et al. (2007). A Compact Static Double-Array Keeping Character Codes. Information Processing & Management 43(1). 10.1016/j.ipm.2006.04.004
Askitis, N., & Zobel, J. (2008). B-tries for Disk-based String Management. The VLDB Journal 18(1). 10.1007/s00778-008-0094-1
Leis, V., Kemper, A., & Neumann, T. (2013). The Adaptive Radix Tree: ARTful Indexing for Main-Memory Databases. IEEE ICDE. 10.1109/ICDE.2013.6544812

Migration from liblevenshtein

The dictionary types moved out of liblevenshtein into this crate. Update imports:

// Old
use liblevenshtein::dictionary::double_array_trie::DoubleArrayTrie;
// New
use libdictenstein::double_array_trie::DoubleArrayTrie;
// …or just
use libdictenstein::prelude::*;

License

Licensed under Apache-2.0. Minimum supported Rust version: 1.70.

libdictenstein 0.2.0