Expand description
Hopcroft’s DFA minimization. Hopcroft’s DFA minimization algorithm.
Partition refinement yields the unique minimal DFA accepting the
same language. The output is canonical up to state-id relabeling,
so minimize(minimize(d)) == minimize(d) holds bytewise after both
runs relabel starting from the start state.
Per perf-audit L.2.2 (audits/perf-kimi-AUDIT.md finding 1): the
prior implementation was the textbook Hopcroft shape, but computed
the predecessor set x for every (splitter, byte) pair by scanning
all states — O(|Σ|·N²) per refinement step. For 10k literal
patterns that’s ~30 billion state inspections and 632 s assembly
time; regex-automata does the same work in <50 ms.
This implementation:
- Builds an inverse transition table
pred: Vec<Vec<(byte, state)>>once so the predecessor set for(splitter, byte)is O(|preds|), not O(N). - Keeps partitions as dense arrays indexed by class id rather
than
BTreeSet<u32>clones. Each state has aclass_of[state]entry; each class stores the backingVec<u32>of its members and alensnapshot used byrefine_in_place. - Uses a
VecDeque<u32>worklist of class ids, not aVecDeque<BTreeSet<u32>>of copied sets.
The asymptotic bound is now O(|Σ|·N·log N), which matches the published Hopcroft cost model.
Functions§
- dfa_
minimize - Minimize a DFA via partition refinement (Hopcroft’s algorithm).