Skip to main content

Module tensor_tiled

Module tensor_tiled

Expand description

L2-cache-friendly tiled matrix multiplication engine. Tensor Tiling — L2-friendly tiled matrix multiplication.

Provides a tiled matmul implementation that operates on tiles that fit within the L2 cache, improving locality for large matrices.

§Determinism

Tile iteration order is deterministic (row-major over tiles).
The summation within each tile uses the same accumulation order.
Same inputs → bit-identical outputs on the same platform.

§Tile Size

Default tile size is 64×64 (32 KB per tile at f64, fits in most L2 caches). Configurable via TiledMatmul::with_tile_size().

Structs§

TiledMatmul: Tiled matrix multiplication engine.