Expand description
inputx-fsa — a compact, zero-dependency ordered map from byte keys to
u64, backed by a minimal acyclic finite-state automaton (MA-FSA /
DAWG) plus a parallel value table addressed by lexicographic rank.
This is a clean-room implementation written for the Inputx IME to replace
the fst crate on the shipped read path — same role (compressed,
mmap-friendly dictionary index with get + prefix/range scan), no
external dependencies. The algorithms (Revuz minimization, MA-FSA as a
minimal perfect hash via per-transition right-language counts) are
textbook; the code is original.
§Model
The automaton recognizes the set of keys (it carries no values inside it — that keeps it minimal regardless of the values). Each accepted key has a stable ordinal = its rank in sorted order, computed during the walk by summing, at every step, the number of keys that sort before the branch we take. Values live in a separate fixed-width array indexed by that ordinal. This is the “numbered MA-FSA / minimal perfect hash” construction.
§Format (v3, little-endian)
magic "IXFA" (4) · version u8(=3) · value_width u8 (1|2|4|8)
value_count u32 · root_off u32 · state_count u32 (18-byte header)
states: byte-offset addressed (no offset table), post-order so every
target precedes its source. Per state, a flags byte:
bit0 = final, bit1 = single-transition.
· single → [flags, label u8, delta uvarint] (count omitted)
· multi → [flags, n uvarint, (label u8, delta uvarint,
count uvarint) × n]
`delta` is a back-distance to the target state; `count` is the
target's right-language size (for the ordinal/rank walk).
values: [value_width bytes; value_count] (tail of the buffer)§Example
use inputx_fsa::{Builder, Fsa};
let mut b = Builder::new();
b.insert(b"apple", 1);
b.insert(b"apply", 2);
b.insert(b"banana", 3);
let fsa = Fsa::new(b.finish()).unwrap();
assert_eq!(fsa.get(b"apple"), Some(1));
assert_eq!(fsa.get(b"grape"), None);
// lazy, composable prefix scan
let app: Vec<_> = fsa.range(b"app").collect();
assert_eq!(app, vec![(b"apple".to_vec(), 1), (b"apply".to_vec(), 2)]);Structs§
- Builder
- Accumulates (key, value) pairs and serializes a minimal FSA.
- Dict
- Read-only two-level dictionary over a byte container.
- Dict
Builder - Accumulates
(code, item, value)triples and serializes aDict. - Fsa
- A read-only minimal-FSA map over a byte container.
- FsaIter
- Lazy, allocation-light iterator over an
Fsa’s (key, value) pairs in sorted order — seeFsa::iter/Fsa::range. Pre-order DFS via an explicit stack (depth = key length), so it composes with theIteratoradapters and stops early without walking the whole automaton. On a corrupt buffer it simply ends (consistent with the bounds-safe reader).
Enums§
- FsaError
- Error parsing an FSA buffer.