1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
//! Tiling Compute Blocks (TCB) - Work Partitioning for High-Performance Kernels
//!
//! TCBs represent the fundamental unit of work partitioning within `ComputeBrick` kernels.
//! While a `ComputeBrick` defines a logical operation (e.g., Q4_K MatMul), a TCB defines
//! the physical execution strategy—how data is partitioned across the memory hierarchy.
//!
//! # Architecture
//!
//! Tiling occurs at three levels:
//! 1. **Macro-Tile (L3/Global Memory)**: Partitioning across CPU sockets or GPU SMs
//! 2. **Midi-Tile (L2/Shared Memory)**: Partitioning within a thread block or Rayon task
//! 3. **Micro-Tile (Registers)**: Smallest unit processed by SIMD or CUDA warps
//!
//! # Modules
//!
//! - `geometry` - TcbGeometry dimensions and level definitions
//! - `config` - TilingConfig and backend selection
//! - `calculator` - TcbIndexCalculator for index computation
//! - `packing` - Memory layout packing utilities
//! - `prefetch` - Prefetch locality hints
//! - `q4k_matvec` - Q4_K quantized matrix-vector tiling
//! - `error` - TilingError types
pub use TcbIndexCalculator;
pub use ;
pub use TilingError;
pub use ;
pub use ;
pub use ;
pub use ;