Skip to main content

Module ctc_decode

Module ctc_decode 

Source
Expand description

CTC decoding: best-path (greedy) and prefix-beam search (Graves 2006; Hannun 2014).

Given per-frame log-probabilities [T, C] over a blank-augmented alphabet, a CTC decoder produces the most probable label sequence after applying the CTC collapse B (merge repeats, then drop blanks).

Two strategies are provided:

  • ctc_greedy_decode — best-path decoding: take the arg-max symbol at each frame and collapse. Fast (O(T·C)) but only a lower bound on the true sequence probability because it ignores alignment multiplicity.

  • ctc_prefix_beam_search — prefix-beam search: maintain a beam of label prefixes, tracking, for each prefix, the probability that it ends in a blank (p_b) versus a non-blank (p_nb). This correctly sums the probabilities of distinct alignments that collapse to the same prefix and recovers higher-probability sequences than greedy decoding.

All probabilities are accumulated in log-space.

Structs§

CtcHypothesis
A scored CTC decoding hypothesis returned by ctc_prefix_beam_search.

Functions§

ctc_greedy_decode
Best-path (greedy) CTC decode: arg-max per frame followed by CTC collapse.
ctc_prefix_beam_search
Prefix-beam-search CTC decoding (Graves 2006; Hannun 2014).