Expand description
Typical (locally typical) decoding for autoregressive sequence generation.
Reference: Meister, C., Pimentel, T., Wiher, G. & Cotterell, R. (2022). Typical Decoding for Natural Language Generation. TACL 2023 (arXiv 2202.00666). https://arxiv.org/abs/2202.00666.
§Algorithm
Given logits z ∈ ℝᵛ, temperature T > 0, mass threshold τ ∈ (0, 1]
and a lower bound min_tokens ≥ 1:
1. p_i = softmax(z_i / T) (numerically-stable)
2. H = -Σ_i p_i log p_i (conditional entropy)
3. c_i = |-log p_i - H| (information-content gap)
4. sort indices by c_i ascending
5. cumulate p along this order until ≥ τ
6. enforce min_tokens; renormalise the kept set
7. sampleLocally-typical decoding selects tokens whose surprisal is closest to
the expected information content of the next-token distribution.
Setting τ = 1.0 retains every token (full softmax sampling);
peaked distributions select the argmax (its surprisal matches the
near-zero entropy); uniform distributions retain every token (every
surprisal equals the entropy log V).
Probabilities below f64::MIN_POSITIVE are floored to a tiny epsilon
before taking the log, so the surprisal stays finite without distorting
the categorical sampler.
Structs§
- Typical
Config - Configuration for
typical_sampleandtypical_sample_batch.
Functions§
- entropy
- Compute the Shannon entropy
H = -Σ p_i log p_iof a probability vector. Public for testing and downstream use. - typical_
sample - Sample a single token id from
logitsusing typical decoding. - typical_
sample_ batch - Batch variant of
typical_samplefornindependent rows of lengthvocabin a flatlogitsbuffer.