Skip to main content

Module top_k

Module top_k 

Source
Expand description

GPU top-K dispatch — returns the K largest elements of a float array.

Used by the Q8 lm_head rerank path to avoid a full 1 MB logits readback. After Q8 matmul writes the full vocabulary of logits, this kernel selects the top-K on GPU; only K * 8 bytes of (index, value) pairs come back to CPU for exact F32 reranking.

Output order is NOT guaranteed — callers that need sorted order should sort themselves. The rerank path sorts implicitly by picking argmax over the reranked logits.

Statics§

TOP_K_SHADER_SOURCE

Functions§

dispatch_top_k_f32
Dispatch a top-K selection on the GPU.
register