Skip to main content

Module gather_bench

Module gather_bench 

Source
Expand description

Gather throughput microbenchmark dispatch.

Provides two kernels for measuring KV cache read throughput:

  • gather_bench_nibble — Simulates TurboQuant SDPA: unpack 4-bit nibble indices then gather from a 16-entry centroid table.
  • gather_bench_f16_seq — Baseline: sequential F16 read + widen to F32.

The throughput ratio between the two kernels determines whether nibble-gather meets the ADR-007 gate of ≥ 50% of sequential F16 throughput.

Statics§

GATHER_BENCH_SHADER_SOURCE
MSL source for the gather benchmark kernels (embedded at compile time).

Functions§

dispatch_gather_f16_seq
Dispatch the sequential F16 read kernel.
dispatch_gather_nibble
Dispatch the nibble-gather kernel.
register
Register both gather benchmark kernels with the given registry.