Expand description
CTC decoding: best-path (greedy) and prefix-beam search (Graves 2006; Hannun 2014).
Given per-frame log-probabilities [T, C] over a blank-augmented alphabet, a
CTC decoder produces the most probable label sequence after applying the
CTC collapse B (merge repeats, then drop blanks).
Two strategies are provided:
-
ctc_greedy_decode— best-path decoding: take the arg-max symbol at each frame and collapse. Fast (O(T·C)) but only a lower bound on the true sequence probability because it ignores alignment multiplicity. -
ctc_prefix_beam_search— prefix-beam search: maintain a beam of label prefixes, tracking, for each prefix, the probability that it ends in a blank (p_b) versus a non-blank (p_nb). This correctly sums the probabilities of distinct alignments that collapse to the same prefix and recovers higher-probability sequences than greedy decoding.
All probabilities are accumulated in log-space.
Structs§
- CtcHypothesis
- A scored CTC decoding hypothesis returned by
ctc_prefix_beam_search.
Functions§
- ctc_
greedy_ decode - Best-path (greedy) CTC decode: arg-max per frame followed by CTC collapse.
- ctc_
prefix_ beam_ search - Prefix-beam-search CTC decoding (Graves 2006; Hannun 2014).