Module dfa_minimize

Expand description

Hopcroft’s DFA minimization. Hopcroft’s DFA minimization algorithm.

Partition refinement yields the unique minimal DFA accepting the same language. The output is canonical up to state-id relabeling, so minimize(minimize(d)) == minimize(d) holds bytewise after both runs relabel starting from the start state.

Per perf-audit L.2.2 (audits/perf-kimi-AUDIT.md finding 1): the prior implementation was the textbook Hopcroft shape, but computed the predecessor set x for every (splitter, byte) pair by scanning all states — O(|Σ|·N²) per refinement step. For 10k literal patterns that’s ~30 billion state inspections and 632 s assembly time; regex-automata does the same work in <50 ms.

This implementation:

Builds an inverse transition table pred: Vec<Vec<(byte, state)>> once so the predecessor set for (splitter, byte) is O(|preds|), not O(N).
Keeps partitions as dense arrays indexed by class id rather than BTreeSet<u32> clones. Each state has a class_of[state] entry; each class stores the backing Vec<u32> of its members and a len snapshot used by refine_in_place.
Uses a VecDeque<u32> worklist of class ids, not a VecDeque<BTreeSet<u32>> of copied sets.

The asymptotic bound is now O(|Σ|·N·log N), which matches the published Hopcroft cost model.

Functions§

dfa_minimize: Minimize a DFA via partition refinement (Hopcroft’s algorithm).