Expand description
Gather throughput microbenchmark dispatch.
Provides two kernels for measuring KV cache read throughput:
gather_bench_nibble— Simulates TurboQuant SDPA: unpack 4-bit nibble indices then gather from a 16-entry centroid table.gather_bench_f16_seq— Baseline: sequential F16 read + widen to F32.
The throughput ratio between the two kernels determines whether nibble-gather meets the ADR-007 gate of ≥ 50% of sequential F16 throughput.
Statics§
- GATHER_
BENCH_ SHADER_ SOURCE - MSL source for the gather benchmark kernels (embedded at compile time).
Functions§
- dispatch_
gather_ f16_ seq - Dispatch the sequential F16 read kernel.
- dispatch_
gather_ nibble - Dispatch the nibble-gather kernel.
- register
- Register both gather benchmark kernels with the given registry.