pub struct MmaDefinition<A: CubeType, B: CubeType, CD: CubeType> { /* private fields */ }Expand description
Defines a matrix multiplication operation, including the input and output type, and the shape.
Implementations§
Source§impl<A: CubePrimitive, B: CubePrimitive, CD: CubePrimitive> MmaDefinition<A, B, CD>
impl<A: CubePrimitive, B: CubePrimitive, CD: CubePrimitive> MmaDefinition<A, B, CD>
Sourcepub fn new(m: u32, n: u32, k: u32) -> Self
pub fn new(m: u32, n: u32, k: u32) -> Self
Create a new matrix definition that is going to be used in the manual matrix-multiply and accumulate function.
You have to declare the shape used for the execution. The shape of the current matrix is determined using the MatrixIdent.
- MatrixIdent::A Shape => (M, K)
- MatrixIdent::B Shape => (K, N)
- MatrixIdent::Accumulator Shape => (M, N)
Not all shapes are supported, and the permitted shapes depend on the element type.
Layout for manual MMA is determined by the runtime and must be handled manually.
Use [line_layout] to check the correct data layout for each element.
Refer to nvidia documentation.
Sourcepub fn new_scaled<S: CubePrimitive>(
m: u32,
n: u32,
k: u32,
scale_factor: u32,
) -> Self
pub fn new_scaled<S: CubePrimitive>( m: u32, n: u32, k: u32, scale_factor: u32, ) -> Self
Create a new matrix definition that is going to be used in the manual matrix-multiply and accumulate function.
You have to declare the shape used for the execution. The shape of the current matrix is determined using the MatrixIdent.
- MatrixIdent::A Shape => (M, K)
- MatrixIdent::B Shape => (K, N)
- MatrixIdent::Accumulator Shape => (M, N)
Not all shapes are supported, and the permitted shapes depend on the element type.
Layout for manual MMA is determined by the runtime and must be handled manually.
Use [line_layout] to check the correct data layout for each element.
Refer to nvidia documentation.
Sourcepub fn num_elems(&self, ident: MatrixIdent) -> u32
pub fn num_elems(&self, ident: MatrixIdent) -> u32
Number of elements in the matrix
Sourcepub fn elems_per_lane(&self, ident: MatrixIdent) -> u32
pub fn elems_per_lane(&self, ident: MatrixIdent) -> u32
Returns the number of elements handled by each lane. Should be packed into Lines of size
line_size with [line_layout].
§Note
“Lane” here refers to the unit relative to a plane, to distinguish it from a unit relative to a cube.
Sourcepub fn lines_per_lane(&self, ident: MatrixIdent) -> u32
pub fn lines_per_lane(&self, ident: MatrixIdent) -> u32
Returns the number of lines of size line_size with layout line_layout per lane.
§Note
“Lane” here refers to the unit relative to a plane, to distinguish it from a unit relative to a cube.
Sourcepub fn line_layout(&self, ident: MatrixIdent) -> MatrixLayout
pub fn line_layout(&self, ident: MatrixIdent) -> MatrixLayout
The layout of each line in this matrix (row major or column major)
Sourcepub fn line_size(&self, ident: MatrixIdent) -> u32
pub fn line_size(&self, ident: MatrixIdent) -> u32
Number of elements in each line passed to the execute function
Sourcepub fn position_of_nth(
&self,
lane_id: u32,
elem_idx: u32,
ident: MatrixIdent,
) -> (u32, u32)
pub fn position_of_nth( &self, lane_id: u32, elem_idx: u32, ident: MatrixIdent, ) -> (u32, u32)
Returns the coordinates of the nth element handled by the lane_id
Each lane contains [elems_per_lane] elements in [line_size] chunks.
Returns (row_idx, col_idx)
§Note
“Lane” here refers to the unit relative to a plane, to distinguish it from a unit relative to a cube.
Sourcepub fn scales_index(&self, lane_id: u32, ident: MatrixIdent) -> u32
pub fn scales_index(&self, lane_id: u32, ident: MatrixIdent) -> u32
Index of the scales for this thread, along the non-major dimension of the matrix.
Each thread loads all scales in the major direction into a single Line.
Sourcepub fn scales_count(&self) -> u32
pub fn scales_count(&self) -> u32
Number of scales in each line (not the line size!). Line size may include padding bytes.
Sourcepub fn scales_line_size(&self) -> u32
pub fn scales_line_size(&self) -> u32
Line size for the scale factors. May be larger than the total number of scales.
Sourcepub fn load_matrix<E: CubePrimitive>(
&self,
row: &Slice<Line<E>>,
ident: MatrixIdent,
num_matrices: u32,
transpose: bool,
) -> Array<Line<E>> ⓘ
pub fn load_matrix<E: CubePrimitive>( &self, row: &Slice<Line<E>>, ident: MatrixIdent, num_matrices: u32, transpose: bool, ) -> Array<Line<E>> ⓘ
Load one or more matrix register using intrinsic instructions. CUDA only.
The number of matrices must be 1, 2, or 4. The rows for the nth matrix are passed by the 8
lanes starting at n * 8. All slice starts must be valid, even for non-participating lanes.
The slice determines the starting address for a 16-byte row loaded by this unit, with
the row index being UNIT_POS_PLANE % 8.
The number of elements is determined by element size.
§Constraints:
Address must be aligned to 16 bytes Address must be in shared memory
Sourcepub fn execute(
&self,
registers_a: &Array<Line<A>>,
registers_b: &Array<Line<B>>,
registers_c: &Array<Line<CD>>,
) -> Array<Line<CD>> ⓘ
pub fn execute( &self, registers_a: &Array<Line<A>>, registers_b: &Array<Line<B>>, registers_c: &Array<Line<CD>>, ) -> Array<Line<CD>> ⓘ
Execute a low level mma operation with manually managed registers. Register layout
and index mapping can be retrieved from the [MatrixDefinition]
Sourcepub fn execute_scaled<S: CubePrimitive>(
&self,
registers_a: &Array<Line<A>>,
registers_b: &Array<Line<B>>,
registers_c: &Array<Line<CD>>,
scales_a: Line<S>,
scales_b: Line<S>,
) -> Array<Line<CD>> ⓘ
pub fn execute_scaled<S: CubePrimitive>( &self, registers_a: &Array<Line<A>>, registers_b: &Array<Line<B>>, registers_c: &Array<Line<CD>>, scales_a: Line<S>, scales_b: Line<S>, ) -> Array<Line<CD>> ⓘ
Execute a low level block scaled mma operation with manually managed registers. Register
layout and index mapping can be retrieved from the [MatrixDefinition]
pub fn __expand_new( scope: &mut Scope, m: u32, n: u32, k: u32, ) -> <Self as CubeType>::ExpandType
pub fn __expand_new_scaled<S: CubePrimitive>( scope: &mut Scope, m: u32, n: u32, k: u32, scale_factor: u32, ) -> <Self as CubeType>::ExpandType
pub fn __expand_num_elems( scope: &mut Scope, this: <Self as CubeType>::ExpandType, ident: MatrixIdent, ) -> u32
pub fn __expand_elems_per_lane( scope: &mut Scope, this: <Self as CubeType>::ExpandType, ident: MatrixIdent, ) -> u32
pub fn __expand_lines_per_lane( scope: &mut Scope, this: <Self as CubeType>::ExpandType, ident: MatrixIdent, ) -> u32
pub fn __expand_line_layout( scope: &mut Scope, this: <Self as CubeType>::ExpandType, ident: MatrixIdent, ) -> MatrixLayout
pub fn __expand_line_size( scope: &mut Scope, this: <Self as CubeType>::ExpandType, ident: MatrixIdent, ) -> u32
pub fn __expand_position_of_nth( scope: &mut Scope, this: <Self as CubeType>::ExpandType, lane_id: <u32 as CubeType>::ExpandType, elem_idx: <u32 as CubeType>::ExpandType, ident: MatrixIdent, ) -> <(u32, u32) as CubeType>::ExpandType
pub fn __expand_scales_index( scope: &mut Scope, this: <Self as CubeType>::ExpandType, lane_id: <u32 as CubeType>::ExpandType, ident: MatrixIdent, ) -> <u32 as CubeType>::ExpandType
pub fn __expand_scales_count( scope: &mut Scope, this: <Self as CubeType>::ExpandType, ) -> u32
pub fn __expand_scales_line_size( scope: &mut Scope, this: <Self as CubeType>::ExpandType, ) -> u32
pub fn __expand_load_matrix<E: CubePrimitive>( scope: &mut Scope, this: <Self as CubeType>::ExpandType, row: <Slice<Line<E>> as CubeType>::ExpandType, ident: MatrixIdent, num_matrices: u32, transpose: bool, ) -> <Array<Line<E>> as CubeType>::ExpandType ⓘ
pub fn __expand_execute( scope: &mut Scope, this: <Self as CubeType>::ExpandType, registers_a: <Array<Line<A>> as CubeType>::ExpandType, registers_b: <Array<Line<B>> as CubeType>::ExpandType, registers_c: <Array<Line<CD>> as CubeType>::ExpandType, ) -> <Array<Line<CD>> as CubeType>::ExpandType ⓘ
pub fn __expand_execute_scaled<S: CubePrimitive>( scope: &mut Scope, this: <Self as CubeType>::ExpandType, registers_a: <Array<Line<A>> as CubeType>::ExpandType, registers_b: <Array<Line<B>> as CubeType>::ExpandType, registers_c: <Array<Line<CD>> as CubeType>::ExpandType, scales_a: <Line<S> as CubeType>::ExpandType, scales_b: <Line<S> as CubeType>::ExpandType, ) -> <Array<Line<CD>> as CubeType>::ExpandType ⓘ
Trait Implementations§
Source§impl<A: Clone + CubeType, B: Clone + CubeType, CD: Clone + CubeType> Clone for MmaDefinition<A, B, CD>
impl<A: Clone + CubeType, B: Clone + CubeType, CD: Clone + CubeType> Clone for MmaDefinition<A, B, CD>
Source§fn clone(&self) -> MmaDefinition<A, B, CD>
fn clone(&self) -> MmaDefinition<A, B, CD>
1.0.0 · Source§fn clone_from(&mut self, source: &Self)
fn clone_from(&mut self, source: &Self)
source. Read more