Expand description
L2-cache-friendly tiled matrix multiplication engine. Tensor Tiling — L2-friendly tiled matrix multiplication.
Provides a tiled matmul implementation that operates on tiles that fit within the L2 cache, improving locality for large matrices.
§Determinism
- Tile iteration order is deterministic (row-major over tiles).
- The summation within each tile uses the same accumulation order.
- Same inputs → bit-identical outputs on the same platform.
§Tile Size
Default tile size is 64×64 (32 KB per tile at f64, fits in most L2 caches).
Configurable via TiledMatmul::with_tile_size().
Structs§
- Tiled
Matmul - Tiled matrix multiplication engine.