Skip to main content

Module typical

Module typical 

Source
Expand description

Typical (locally typical) decoding for autoregressive sequence generation.

Reference: Meister, C., Pimentel, T., Wiher, G. & Cotterell, R. (2022). Typical Decoding for Natural Language Generation. TACL 2023 (arXiv 2202.00666). https://arxiv.org/abs/2202.00666.

§Algorithm

Given logits z ∈ ℝᵛ, temperature T > 0, mass threshold τ ∈ (0, 1] and a lower bound min_tokens ≥ 1:

1. p_i  = softmax(z_i / T)                       (numerically-stable)
2. H    = -Σ_i p_i log p_i                       (conditional entropy)
3. c_i  = |-log p_i - H|                         (information-content gap)
4. sort indices by c_i ascending
5. cumulate p along this order until ≥ τ
6. enforce min_tokens; renormalise the kept set
7. sample

Locally-typical decoding selects tokens whose surprisal is closest to the expected information content of the next-token distribution. Setting τ = 1.0 retains every token (full softmax sampling); peaked distributions select the argmax (its surprisal matches the near-zero entropy); uniform distributions retain every token (every surprisal equals the entropy log V).

Probabilities below f64::MIN_POSITIVE are floored to a tiny epsilon before taking the log, so the surprisal stays finite without distorting the categorical sampler.

Structs§

TypicalConfig
Configuration for typical_sample and typical_sample_batch.

Functions§

entropy
Compute the Shannon entropy H = -Σ p_i log p_i of a probability vector. Public for testing and downstream use.
typical_sample
Sample a single token id from logits using typical decoding.
typical_sample_batch
Batch variant of typical_sample for n independent rows of length vocab in a flat logits buffer.