Expand description
Solves full reductions by loading blocks into shared memory. Handles memory movement, bound checks, plane specialization.
Modules§
- args
- memory
- multi_
stage - plane_
write - read
- Readers read from global memory and write to stage memory
- single_
stage
Structs§
- Load
Specialization Config - Configuration for how each input tensor (Lhs and Rhs) is loaded, specifying the plane roles responsible for loading them.
- MaxGlobal
Reader Planes - Maximal number of planes each reader can handle to divide its workload evenly
- Partitioned
Stage - Layoutless stage for current writers. Tile only depends on the unit index, not the out tile.
- Partitioned
Stage Expand - Partitioned
Stage Family - Plane
Role Config - Contains the number of plane in each role and the rule to distinguish planes based on their plane id
- Plane
Writer - Writes tiles from out shared memory to output global memory using a plane for each tile
- Plane
Writer Expand - Plane
Writer Family - Specialized
Loading Sides - Aggregates loading sides for both main flow and load only roles
- Specializer
- Specialization information in cube functions
- Unit
Writer - Writes tiles from out shared memory to output global memory using a unit for each tile
- Unit
Writer Expand - Unit
Writer Family
Enums§
- Loading
Sides - Specifies which input(s) a plane role participates in loading.
- Role
Rule - Rule to distinguish a plane’s role based on its plane id
- Role
Rule Config - Comptime version of RoleRule
- Specialization
Tensor Config - Determines which types of planes are responsible for loading a tensor.
- Specializer
Kind - Comptime information of specializer
- Write
Event - Events that occur during the process of storing tiles to a stage and executing writes
- Write
Event Expand
Traits§
- Copy
Mechanism - Allows to copy a slice of data from global to shared memory asynchronously
- Global
Config - Configuration for the global matmul level.
- Global
Matmul - Provides matrix multiplication operations at the global level.
- Global
Matmul Family - A family of matmuls working with any precision.
- Global
Writer - Responsible of writing the accumulated stage matmul output to global memory
- Global
Writer Family - Write
Event Listener - Function that is called at each WriteEvent