Module global

Expand description

Solves full reductions by loading blocks into shared memory. Handles memory movement, bound checks, plane specialization.

Re-exports§

LoadSpecializationConfig: Configuration for how each input tensor (Lhs and Rhs) is loaded, specifying the plane roles responsible for loading them.
MaxLoaderPlanes: Maximal number of planes each loader can handle to divide its workload evenly
PlaneRoleConfig: Contains the number of plane in each role and the rule to distinguish planes based on their plane id
PlaneWriter: Writes tiles from out shared memory to output global memory using a plane for each tile
SpecializedLoadingSides: Aggregates loading sides for both main flow and load only roles
Specializer: Specialization information in cube functions
UnitWriter: Writes tiles from out shared memory to output global memory using a unit for each tile
ZeroAccumulatorLoader: Accumulator loader that zeros the accumulator
ZeroAccumulatorLoaderExpand

LoadingSides: Specifies which input(s) a plane role participates in loading.
RoleRule: Rule to distinguish a plane’s role based on its plane id
RoleRuleConfig: Comptime version of RoleRule
SpecializationTensorConfig: Determines which types of planes are responsible for loading a tensor.
SpecializerKind: Comptime information of specializer

AccumulatorLoader: Loads an accumulator with pre-defined data
CopyMechanism: Allows to copy a slice of data from global to shared memory asynchronously
GlobalConfig: Configuration for the global matmul level.
GlobalMatmul: Provides matrix multiplication operations at the global level.
GlobalMatmulFamily: A family of matmuls working with any precision.
GlobalWriter: Responsible of writing the accumulated stage matmul output to global memory