Expand description
Tiling Compute Blocks (TCB) - Work Partitioning for High-Performance Kernels
TCBs represent the fundamental unit of work partitioning within ComputeBrick kernels.
While a ComputeBrick defines a logical operation (e.g., Q4_K MatMul), a TCB defines
the physical execution strategy—how data is partitioned across the memory hierarchy.
§Architecture
Tiling occurs at three levels:
- Macro-Tile (L3/Global Memory): Partitioning across CPU sockets or GPU SMs
- Midi-Tile (L2/Shared Memory): Partitioning within a thread block or Rayon task
- Micro-Tile (Registers): Smallest unit processed by SIMD or CUDA warps
§Modules
geometry- TcbGeometry dimensions and level definitionsconfig- TilingConfig and backend selectioncalculator- TcbIndexCalculator for index computationpacking- Memory layout packing utilitiesprefetch- Prefetch locality hintsq4k_matvec- Q4_K quantized matrix-vector tilingerror- TilingError types
Structs§
- TcbGeometry
- Dimensions for a Tiling Compute Block
- TcbIndex
Calculator - Index calculator for hierarchical tiling
- Tiled
Q4KMatvec - Tiled Q4_K MatVec executor
- Tiling
Config - Complete tiling configuration for a kernel
- Tiling
Stats - Statistics for a tiled operation
Enums§
- Packing
Layout - Memory layout for packed matrices
- Prefetch
Locality - Prefetch locality hint
- TcbLevel
- Tiling hierarchy level
- Tiling
Backend - Backend target for tiling configuration
- Tiling
Error - Tiling configuration errors
Constants§
- Q4K_
SUPERBLOCK_ BYTES - Q4K_
SUPERBLOCK_ SIZE - Q4_K superblock constants (per GGML specification)
Functions§
- extract_
scale_ min_ 6bit - Extract 6-bit scale and min values from packed scales array
- f16_
to_ f32 - Convert 2 bytes (f16 IEEE 754) to f32
- optimal_
prefetch_ distance - Calculate optimal prefetch distance based on tile geometry and cache level
- pack_
a_ index - Calculate packed index for panel-major A layout
- pack_
b_ index - Calculate packed index for panel-major B layout
- swizzle_
index - Apply XOR swizzling for shared memory bank conflict avoidance