Module global

Source
Expand description

Solves full reductions by loading blocks into shared memory. Handles memory movement, bound checks, plane specialization.

Re-exports§

pub use quantization::*;

Modules§

args
global_memory
load
Loaders read from global memory and write to stage memory
multi_stage
quantization
single_stage

Structs§

LoadSpecializationConfig
Configuration for how each input tensor (Lhs and Rhs) is loaded, specifying the plane roles responsible for loading them.
MaxLoaderPlanes
Maximal number of planes each loader can handle to divide its workload evenly
PlaneRoleConfig
Contains the number of plane in each role and the rule to distinguish planes based on their plane id
PlaneWriter
Writes tiles from out shared memory to output global memory using a plane for each tile
SpecializedLoadingSides
Aggregates loading sides for both main flow and load only roles
Specializer
Specialization information in cube functions
UnitWriter
Writes tiles from out shared memory to output global memory using a unit for each tile
ZeroAccumulatorLoader
Accumulator loader that zeros the accumulator
ZeroAccumulatorLoaderExpand

Enums§

LoadingSides
Specifies which input(s) a plane role participates in loading.
RoleRule
Rule to distinguish a plane’s role based on its plane id
RoleRuleConfig
Comptime version of RoleRule
SpecializationTensorConfig
Determines which types of planes are responsible for loading a tensor.
SpecializerKind
Comptime information of specializer

Traits§

AccumulatorLoader
Loads an accumulator with pre-defined data
CopyMechanism
Allows to copy a slice of data from global to shared memory asynchronously
GlobalConfig
Configuration for the global matmul level.
GlobalMatmul
Provides matrix multiplication operations at the global level.
GlobalMatmulFamily
A family of matmuls working with any precision.
GlobalWriter
Responsible of writing the accumulated stage matmul output to global memory