Expand description
Solves full reductions by loading blocks into shared memory. Handles memory movement, bound checks, plane specialization.
Re-exports§
pub use quantization::*;
Modules§
- args
- global_
memory - load
- Loaders read from global memory and write to stage memory
- multi_
stage - quantization
- single_
stage
Structs§
- Load
Specialization Config - Configuration for how each input tensor (Lhs and Rhs) is loaded, specifying the plane roles responsible for loading them.
- MaxLoader
Planes - Maximal number of planes each loader can handle to divide its workload evenly
- Plane
Role Config - Contains the number of plane in each role and the rule to distinguish planes based on their plane id
- Plane
Writer - Writes tiles from out shared memory to output global memory using a plane for each tile
- Specialized
Loading Sides - Aggregates loading sides for both main flow and load only roles
- Specializer
- Specialization information in cube functions
- Unit
Writer - Writes tiles from out shared memory to output global memory using a unit for each tile
- Zero
Accumulator Loader - Accumulator loader that zeros the accumulator
- Zero
Accumulator Loader Expand
Enums§
- Loading
Sides - Specifies which input(s) a plane role participates in loading.
- Role
Rule - Rule to distinguish a plane’s role based on its plane id
- Role
Rule Config - Comptime version of RoleRule
- Specialization
Tensor Config - Determines which types of planes are responsible for loading a tensor.
- Specializer
Kind - Comptime information of specializer
Traits§
- Accumulator
Loader - Loads an accumulator with pre-defined data
- Copy
Mechanism - Allows to copy a slice of data from global to shared memory asynchronously
- Global
Config - Configuration for the global matmul level.
- Global
Matmul - Provides matrix multiplication operations at the global level.
- Global
Matmul Family - A family of matmuls working with any precision.
- Global
Writer - Responsible of writing the accumulated stage matmul output to global memory