pub struct TilingConfig {
pub name: String,
pub macro_tile: TcbGeometry,
pub midi_tile: TcbGeometry,
pub micro_tile: TcbGeometry,
pub backend: TilingBackend,
}Expand description
Complete tiling configuration for a kernel
Contains geometry for all three tiling levels, enabling hierarchical cache-aware execution.
Fields§
§name: StringKernel name for identification
macro_tile: TcbGeometryMacro-tile geometry (L3/Global)
midi_tile: TcbGeometryMidi-tile geometry (L2/Shared)
micro_tile: TcbGeometryMicro-tile geometry (Registers)
backend: TilingBackendTarget backend
Implementations§
Source§impl TilingConfig
impl TilingConfig
Sourcepub fn gpu_q4k_matvec() -> Self
pub fn gpu_q4k_matvec() -> Self
Create configuration for GPU Q4_K MatVec
Optimized for single-token generation where M=1.
Sourcepub fn gpu_q4k_matmul() -> Self
pub fn gpu_q4k_matmul() -> Self
Create configuration for GPU Q4_K MatMul (batched)
Optimized for prefill where M > 1.
Sourcepub fn gpu_softmax() -> Self
pub fn gpu_softmax() -> Self
Create configuration for GPU Softmax
Sourcepub fn cpu_avx512_matmul() -> Self
pub fn cpu_avx512_matmul() -> Self
Create configuration for CPU AVX-512 MatMul
Optimized for 512-bit wide SIMD:
- 16 floats per ZMM register
- 32 ZMM registers available
- 4×16 micro-kernel uses 8 registers (4 accumulators + 4 scratch)
Sourcepub fn cpu_avx512_q4k_matvec() -> Self
pub fn cpu_avx512_q4k_matvec() -> Self
Create configuration for CPU AVX-512 Q4K MatVec
Optimized for Q4_K quantized inference with 512-bit SIMD. Key differences from AVX2:
- 64-byte aligned for cache line optimization
- 4×1 micro-kernel processes 4 rows simultaneously
- K=256 aligned to Q4_K superblock
Sourcepub fn cpu_avx512_vnni_q4k_q8k() -> Self
pub fn cpu_avx512_vnni_q4k_q8k() -> Self
Create configuration for AVX-512 VNNI Q4K×Q8K integer dot product
AVX-512 VNNI (Vector Neural Network Instructions) provides:
- VPDPBUSD: 8-bit unsigned × 8-bit signed multiply-add to i32
- VPDPWSSD: 16-bit signed × 16-bit signed multiply-add to i32
This enables pure integer Q4K×Q8K without intermediate f32 conversion.
Sourcepub fn cpu_avx2_matmul() -> Self
pub fn cpu_avx2_matmul() -> Self
Create configuration for CPU AVX2 MatMul
Sourcepub fn cpu_avx2_q4k_matvec() -> Self
pub fn cpu_avx2_q4k_matvec() -> Self
Create configuration for CPU Q4_K MatVec (AVX2)
Sourcepub fn cpu_rmsnorm() -> Self
pub fn cpu_rmsnorm() -> Self
Create configuration for RMSNorm (CPU)
Sourcepub fn validate(&self) -> Result<(), TilingError>
pub fn validate(&self) -> Result<(), TilingError>
Validate that tiling configuration is internally consistent
Sourcepub fn num_macro_tiles(&self, m: u32, n: u32) -> u32
pub fn num_macro_tiles(&self, m: u32, n: u32) -> u32
Calculate total number of macro-tiles for given problem size
Sourcepub fn midi_tiles_per_macro(&self) -> u32
pub fn midi_tiles_per_macro(&self) -> u32
Calculate total number of midi-tiles within a macro-tile
Sourcepub fn micro_tiles_per_midi(&self) -> u32
pub fn micro_tiles_per_midi(&self) -> u32
Calculate total number of micro-tiles within a midi-tile
Trait Implementations§
Source§impl Clone for TilingConfig
impl Clone for TilingConfig
Source§fn clone(&self) -> TilingConfig
fn clone(&self) -> TilingConfig
1.0.0 · Source§fn clone_from(&mut self, source: &Self)
fn clone_from(&mut self, source: &Self)
source. Read more