Expand description
Sampling algorithms kernel.
Matches sampling-algorithms-v1.yaml.
Greedy, top-k, top-p, and temperature sampling for autoregressive generation.
Each function provides one of three backends:
fn {name}_scalar(...)– Pure Rust scalar reference (ground truth)unsafe fn {name}_avx2(...)– AVX2 SIMD implementationfn sampling_ptx() -> &'static str– PTX assembly source string
Functions§
- greedy_
avx2 ⚠ - AVX2 greedy sampling – delegates to scalar.
- greedy_
scalar - Greedy sampling: return the index of the maximum logit.
- sample_
scalar - Full sampling pipeline: apply temperature, softmax, then greedy (scalar reference).
- sampling_
ptx - PTX assembly for greedy sampling (argmax reduction).
- temperature_
avx2 ⚠ - AVX2 temperature scaling – delegates to scalar.
- temperature_
scalar - Apply temperature scaling to logits in-place:
logits[i] /= temperature. - top_
k_ scalar - Top-K filtering: zero out all probabilities except the K highest.
- top_
p_ scalar - Top-P (nucleus) filtering: retain the minimal set of tokens whose cumulative
probability exceeds
threshold.