Skip to main content

Module sampling

Module sampling 

Source
Expand description

Sampling algorithms kernel.

Matches sampling-algorithms-v1.yaml. Greedy, top-k, top-p, and temperature sampling for autoregressive generation.

Each function provides one of three backends:

  • fn {name}_scalar(...) – Pure Rust scalar reference (ground truth)
  • unsafe fn {name}_avx2(...) – AVX2 SIMD implementation
  • fn sampling_ptx() -> &'static str – PTX assembly source string

Functions§

greedy_avx2
AVX2 greedy sampling – delegates to scalar.
greedy_scalar
Greedy sampling: return the index of the maximum logit.
sample_scalar
Full sampling pipeline: apply temperature, softmax, then greedy (scalar reference).
sampling_ptx
PTX assembly for greedy sampling (argmax reduction).
temperature_avx2
AVX2 temperature scaling – delegates to scalar.
temperature_scalar
Apply temperature scaling to logits in-place: logits[i] /= temperature.
top_k_scalar
Top-K filtering: zero out all probabilities except the K highest.
top_p_scalar
Top-P (nucleus) filtering: retain the minimal set of tokens whose cumulative probability exceeds threshold.