inputx-fsa
A compact, zero-dependency ordered map from byte keys to u64, backed by a
minimal acyclic finite-state automaton (MA-FSA / DAWG) plus a value table
addressed by lexicographic rank.
It's a clean-room implementation written for the Inputx
IME's dictionaries — a focused stand-in for the read + build path of the
excellent fst crate, with no dependencies
and an IME-tuned two-level Dict on top.
[]
= "1"
What you get
Fsa— an immutable&[u8] → u64map.get, prefix/range scan (eagerprefix(), zero-allocprefix_for_each(), or a lazyIteratorviaiter()/range()).Dict— a one-code-to-many index: each byte code maps to an ordered list of(item, value)pairs. Keeps the unique item bytes out of the automaton, so it's markedly smaller than flatteningcode\0item → valueinto one FSA — the structure an IME wants (a pinyin/wubi code → its ranked candidate words).- Build once, read forever —
Builder/DictBuilderserialize to aVec<u8>you embed (include_bytes!) ormmap; the reader works over anyAsRef<[u8]>with no decompression step.
Properties
- Zero dependencies. That is the entire point.
#![no_std]with--no-default-features(alloc only) — reader and builder work; the defaultstdfeature just uses a faster build register.#![forbid(unsafe_code)], and the reader never panics on a corrupt or untrusted buffer — out-of-range reads degrade to empty/None.- Small + fast. Measured against
fston the Inputx shipped pinyin set (216k entries): the two-levelDictis ~12% smaller, exactgetis ~2× faster, and prefix scan ~5.6× faster (a tiny code automaton + contiguous item blob beats walking acode\0itemgraph per entry).
Example
use ;
let mut b = new;
b.insert;
b.insert;
b.insert;
let bytes = b.finish; // serialize (embed / mmap this)
let fsa = new.unwrap; // read over any AsRef<[u8]>
assert_eq!;
assert_eq!;
// lazy, composable prefix scan
let app: = fsa.range.collect;
assert_eq!;
Two-level Dict (a code → many ranked items):
use ;
let mut b = new;
b.insert; // 我
b.insert; // 握
b.insert; // 我们
let dict = new.unwrap;
// items come back value-descending (top candidate first)
let mut got = Stringnew;
dict.get_for_each;
assert_eq!; // 我握
API stability
The 1.x line follows semver:
Fsa/Dict/Builder/DictBuilderpublic surface — no breaking changes within 1.x. New methods may be added as minor bumps.- On-disk format — header carries
magic = "IXFA"+version = 3. v3 is the only currently-emitted version; a future v4 reader will still accept v3 buffers (FsaError::BadVersionis reserved for the reverse — old reader, newer buffer). #![no_std]invariant — building with--no-default-featureswill continue to compile foralloc-only targets in the entire 1.x line.- Determinism —
Builder::finishis deterministic given the same set of(key, value)pairs (regardless of insert order).
Anything not on pub is internal — module layout, helper traits,
serialization internals — and may change in any minor release.
Runnable examples
License
MIT OR Apache-2.0.