Expand description
GPU top-K dispatch — returns the K largest elements of a float array.
Used by the Q8 lm_head rerank path to avoid a full 1 MB logits readback. After Q8 matmul writes the full vocabulary of logits, this kernel selects the top-K on GPU; only K * 8 bytes of (index, value) pairs come back to CPU for exact F32 reranking.
Output order is NOT guaranteed — callers that need sorted order should sort themselves. The rerank path sorts implicitly by picking argmax over the reranked logits.
Statics§
Functions§
- dispatch_
top_ k_ f32 - Dispatch a top-K selection on the GPU.
- register