Struct MmaDefinition

Source

pub struct MmaDefinition<A: CubeType, B: CubeType, CD: CubeType> { /* private fields */ }

Expand description

Defines a matrix multiplication operation, including the input and output type, and the shape.

Implementations§

Source §

impl<A: Scalar, B: Scalar, CD: Scalar> MmaDefinition<A, B, CD>

Source

pub fn new(m: usize, n: usize, k: usize) -> Self

Create a new matrix definition that is going to be used in the manual matrix-multiply and accumulate execute_manual_mma() function.

You have to declare the shape used for the execution. The shape of the current matrix is determined using the MatrixIdent.

MatrixIdent::A Shape => (M, K)
MatrixIdent::B Shape => (K, N)
MatrixIdent::Accumulator Shape => (M, N)

Not all shapes are supported, and the permitted shapes depend on the element type. Layout for manual MMA is determined by the runtime and must be handled manually. Use Self::vector_layout to check the correct data layout for each element.

Refer to nvidia documentation.

Source

pub fn new_scaled<S: CubePrimitive>( m: usize, n: usize, k: usize, scale_factor: usize, ) -> Self

Create a new matrix definition that is going to be used in the manual matrix-multiply and accumulate execute_manual_mma() function.

You have to declare the shape used for the execution. The shape of the current matrix is determined using the MatrixIdent.

MatrixIdent::A Shape => (M, K)
MatrixIdent::B Shape => (K, N)
MatrixIdent::Accumulator Shape => (M, N)

Not all shapes are supported, and the permitted shapes depend on the element type. Layout for manual MMA is determined by the runtime and must be handled manually. Use Self::vector_layout to check the correct data layout for each element.

Refer to nvidia documentation.

Source

pub fn num_elems(&self, ident: MatrixIdent) -> usize

Number of elements in the matrix

Source

pub fn elems_per_lane(&self, ident: MatrixIdent) -> usize

Returns the number of elements handled by each lane. Should be packed into Vectors of size vector_size with Self::vector_layout.

§Note

“Lane” here refers to the unit relative to a plane, to distinguish it from a unit relative to a cube.

Source

pub fn vectors_per_lane(&self, ident: MatrixIdent) -> usize

Returns the number of vectors of size vector_size with layout vector_layout per lane.

§Note

“Lane” here refers to the unit relative to a plane, to distinguish it from a unit relative to a cube.

Source

pub fn vector_layout(&self, ident: MatrixIdent) -> MatrixLayout

The layout of each vector in this matrix (row major or column major)

Source

pub fn vector_size(&self, ident: MatrixIdent) -> VectorSize

Number of elements in each vector passed to the execute function. Represents the maximum number of contiguous elements held by the thread.

Source

pub fn position_of_nth( &self, lane_id: u32, elem_idx: u32, ident: MatrixIdent, ) -> (u32, u32)

Returns the coordinates of the nth element handled by the lane_id Each lane contains Self::elems_per_lane elements in Self::vector_size chunks. Returns (row_idx, col_idx)

§Note

“Lane” here refers to the unit relative to a plane, to distinguish it from a unit relative to a cube.

Source

pub fn scales_index(&self, lane_id: u32, ident: MatrixIdent) -> u32

Index of the scales for this thread, along the non-major dimension of the matrix. Each thread loads all scales in the major direction into a single Vector.

Source

pub fn scales_count(&self) -> usize

Number of scales in each vector (not the vector size!). Vector size may include padding bytes.

Source

pub fn scales_vector_size(&self) -> VectorSize

Vector size for the scale factors. May be larger than the total number of scales.

Source

pub fn load_matrix<E: CubePrimitive, NO: Size>( &self, row: &Slice<E>, ident: MatrixIdent, num_matrices: usize, transpose: bool, ) -> Array<Vector<E::Scalar, NO>> ⓘ

Load one or more matrix register using intrinsic instructions. CUDA only. The number of matrices must be 1, 2, or 4. The rows for the nth matrix are passed by the 8 lanes starting at n * 8. All slice starts must be valid, even for non-participating lanes. The slice determines the starting address for a 16-byte row loaded by this unit, with the row index being UNIT_POS_PLANE % 8. The number of elements is determined by element size.

§Constraints:

Address must be aligned to 16 bytes Address must be in shared memory

Source

pub fn load_matrix_inplace<E: Scalar, N: Size>( &self, row: &Slice<E>, fragment: &mut Array<Vector<E, N>>, ident: MatrixIdent, num_matrices: usize, transpose: bool, )

Source

pub fn store_matrix<E: CubePrimitive, N: Size>( &self, row: &mut Slice<E, ReadWrite>, registers: &Array<Vector<E::Scalar, N>>, ident: MatrixIdent, num_matrices: usize, transpose: bool, )

Store one or more matrix register using intrinsic instructions. CUDA only. The number of matrices must be 1, 2, or 4. The rows for the nth matrix are passed by the 8 lanes starting at n * 8. All slice starts must be valid, even for non-participating lanes. The slice determines the starting address for a 16-byte row loaded by this unit, with the row index being UNIT_POS_PLANE % 8. The number of elements is determined by element size.

§Constraints:

Address must be aligned to 16 bytes Address must be in shared memory

Source

pub fn execute<NA: Size, NB: Size, NC: Size>( &self, registers_a: &Array<Vector<A, NA>>, registers_b: &Array<Vector<B, NB>>, registers_c: &Array<Vector<CD, NC>>, ) -> Array<Vector<CD, NC>> ⓘ

Execute a low level mma operation with manually managed registers. Register layout and index mapping can be retrieved from the MmaDefinition

Source

pub fn execute_inplace<NA: Size, NB: Size, NC: Size>( &self, registers_a: &Array<Vector<A, NA>>, registers_b: &Array<Vector<B, NB>>, registers_c: &mut Array<Vector<CD, NC>>, )

Source

pub fn execute_scaled<S: Scalar, NA: Size, NB: Size, NC: Size, NS: Size>( &self, registers_a: &Array<Vector<A, NA>>, registers_b: &Array<Vector<B, NB>>, registers_c: &Array<Vector<CD, NC>>, scales_a: Vector<S, NS>, scales_b: Vector<S, NS>, ) -> Array<Vector<CD, NC>> ⓘ

Execute a low level block scaled mma operation with manually managed registers. Register layout and index mapping can be retrieved from the MmaDefinition

Source

pub fn __expand_new( scope: &mut Scope, m: usize, n: usize, k: usize, ) -> <Self as CubeType>::ExpandType

Source

pub fn __expand_new_scaled<S: CubePrimitive>( scope: &mut Scope, m: usize, n: usize, k: usize, scale_factor: usize, ) -> <Self as CubeType>::ExpandType

Source

pub fn __expand_num_elems( scope: &mut Scope, this: &<Self as CubeType>::ExpandType, ident: MatrixIdent, ) -> usize

Source

pub fn __expand_elems_per_lane( scope: &mut Scope, this: &<Self as CubeType>::ExpandType, ident: MatrixIdent, ) -> usize

Source

pub fn __expand_vectors_per_lane( scope: &mut Scope, this: &<Self as CubeType>::ExpandType, ident: MatrixIdent, ) -> usize

Source

pub fn __expand_vector_layout( scope: &mut Scope, this: &<Self as CubeType>::ExpandType, ident: MatrixIdent, ) -> MatrixLayout

Source

pub fn __expand_vector_size( scope: &mut Scope, this: &<Self as CubeType>::ExpandType, ident: MatrixIdent, ) -> VectorSize

Source

pub fn __expand_position_of_nth( scope: &mut Scope, this: &<Self as CubeType>::ExpandType, lane_id: <u32 as CubeType>::ExpandType, elem_idx: <u32 as CubeType>::ExpandType, ident: MatrixIdent, ) -> <(u32, u32) as CubeType>::ExpandType

Source

pub fn __expand_scales_index( scope: &mut Scope, this: &<Self as CubeType>::ExpandType, lane_id: <u32 as CubeType>::ExpandType, ident: MatrixIdent, ) -> <u32 as CubeType>::ExpandType

Source

pub fn __expand_scales_count( scope: &mut Scope, this: &<Self as CubeType>::ExpandType, ) -> usize

Source

pub fn __expand_scales_vector_size( scope: &mut Scope, this: &<Self as CubeType>::ExpandType, ) -> VectorSize

Source

pub fn __expand_load_matrix<E: CubePrimitive, NO: Size>( scope: &mut Scope, this: &<Self as CubeType>::ExpandType, row: <Slice<E> as CubeType>::ExpandType, ident: MatrixIdent, num_matrices: usize, transpose: bool, ) -> <Array<Vector<E::Scalar, NO>> as CubeType>::ExpandType ⓘ

Source

pub fn __expand_load_matrix_inplace<E: Scalar, N: Size>( scope: &mut Scope, this: &<Self as CubeType>::ExpandType, row: <Slice<E> as CubeType>::ExpandType, fragment: <Array<Vector<E, N>> as CubeType>::ExpandType, ident: MatrixIdent, num_matrices: usize, transpose: bool, ) -> <() as CubeType>::ExpandType

Source

pub fn __expand_store_matrix<E: CubePrimitive, N: Size>( scope: &mut Scope, this: &<Self as CubeType>::ExpandType, row: <Slice<E, ReadWrite> as CubeType>::ExpandType, registers: <Array<Vector<E::Scalar, N>> as CubeType>::ExpandType, ident: MatrixIdent, num_matrices: usize, transpose: bool, ) -> <() as CubeType>::ExpandType

Source

pub fn __expand_execute<NA: Size, NB: Size, NC: Size>( scope: &mut Scope, this: &<Self as CubeType>::ExpandType, registers_a: <Array<Vector<A, NA>> as CubeType>::ExpandType, registers_b: <Array<Vector<B, NB>> as CubeType>::ExpandType, registers_c: <Array<Vector<CD, NC>> as CubeType>::ExpandType, ) -> <Array<Vector<CD, NC>> as CubeType>::ExpandType ⓘ

Source

pub fn __expand_execute_inplace<NA: Size, NB: Size, NC: Size>( scope: &mut Scope, this: &<Self as CubeType>::ExpandType, registers_a: <Array<Vector<A, NA>> as CubeType>::ExpandType, registers_b: <Array<Vector<B, NB>> as CubeType>::ExpandType, registers_c: <Array<Vector<CD, NC>> as CubeType>::ExpandType, ) -> <() as CubeType>::ExpandType

Source

pub fn __expand_execute_scaled<S: Scalar, NA: Size, NB: Size, NC: Size, NS: Size>( scope: &mut Scope, this: &<Self as CubeType>::ExpandType, registers_a: <Array<Vector<A, NA>> as CubeType>::ExpandType, registers_b: <Array<Vector<B, NB>> as CubeType>::ExpandType, registers_c: <Array<Vector<CD, NC>> as CubeType>::ExpandType, scales_a: <Vector<S, NS> as CubeType>::ExpandType, scales_b: <Vector<S, NS> as CubeType>::ExpandType, ) -> <Array<Vector<CD, NC>> as CubeType>::ExpandType ⓘ

Trait Implementations§

Source §

impl<A: Clone + CubeType, B: Clone + CubeType, CD: Clone + CubeType> Clone for MmaDefinition<A, B, CD>

Source §

fn clone(&self) -> MmaDefinition<A, B, CD>

Returns a duplicate of the value. Read more

1.0.0 (const: unstable) · Source§

fn clone_from(&mut self, source: &Self)

Performs copy-assignment from source. Read more

Source §

impl<A: CubeType, B: CubeType, CD: CubeType> CubeType for MmaDefinition<A, B, CD>

Source §

type ExpandType = MmaDefinitionExpand<A, B, CD>

Source §

impl<A: Copy + CubeType, B: Copy + CubeType, CD: Copy + CubeType> Copy for MmaDefinition<A, B, CD>

Auto Trait Implementations§

§

impl<A, B, CD> Freeze for MmaDefinition<A, B, CD>

§

impl<A, B, CD> RefUnwindSafe for MmaDefinition<A, B, CD>
where A: RefUnwindSafe, B: RefUnwindSafe, CD: RefUnwindSafe,

§

impl<A, B, CD> Send for MmaDefinition<A, B, CD>
where A: Send, B: Send, CD: Send,

§

impl<A, B, CD> Sync for MmaDefinition<A, B, CD>
where A: Sync, B: Sync, CD: Sync,

§

impl<A, B, CD> Unpin for MmaDefinition<A, B, CD>
where A: Unpin, B: Unpin, CD: Unpin,

§

impl<A, B, CD> UnsafeUnpin for MmaDefinition<A, B, CD>

§

impl<A, B, CD> UnwindSafe for MmaDefinition<A, B, CD>
where A: UnwindSafe, B: UnwindSafe, CD: UnwindSafe,

Blanket Implementations§

Source §

impl<T> Any for T
where T: 'static + ?Sized,

Source §

fn type_id(&self) -> TypeId

Gets the TypeId of self. Read more

Source §

impl<T> Borrow<T> for T
where T: ?Sized,

Source §

fn borrow(&self) -> &T

Immutably borrows from an owned value. Read more

Source §

impl<T> BorrowMut<T> for T
where T: ?Sized,

Source §

fn borrow_mut(&mut self) -> &mut T

Mutably borrows from an owned value. Read more

Source §

unsafe fn clone_to_uninit(&self, dest: *mut u8)

🔬This is a nightly-only experimental API. (clone_to_uninit)

Performs copy-assignment from self to dest. Read more

Source §

impl<T> From<T> for T

Source §

fn from(t: T) -> T

Returns the argument unchanged.

Source §

impl<T, U> Into for T
where U: From<T>,

Source §

fn into(self) -> U

Calls U::from(self).

That is, this conversion is whatever the implementation of From<T> for U chooses to do.

Source §

impl<T> IntoComptime for T

Source §

fn comptime(self) -> Self

Source §

impl<T> ToOwned for T
where T: Clone,

Source §

type Owned = T

The resulting type after obtaining ownership.

Source §

fn to_owned(&self) -> T

Creates owned data from borrowed data, usually by cloning. Read more

Source §

fn clone_into(&self, target: &mut T)

Uses borrowed data to replace owned data, usually by cloning. Read more

Source §

impl<T, U> TryFrom for T
where U: Into<T>,

Source §

type Error = Infallible

The type returned in the event of a conversion error.

Source §

fn try_from(value: U) -> Result<T, <T as TryFrom>::Error>

Performs the conversion.

Source §

impl<T, U> TryInto for T
where U: TryFrom<T>,

Source §

type Error = >::Error

The type returned in the event of a conversion error.

Source §

fn try_into(self) -> Result<U, >::Error>

Performs the conversion.

Source §

impl<T> TuneInputs for T
where T: Clone + Send + Sync + 'static,

Source §

type At<'a> = T

The concrete input type at lifetime 'a.

Struct MmaDefinition Copy item path

Implementations§

impl<A: Scalar, B: Scalar, CD: Scalar> MmaDefinition<A, B, CD>

pub fn new(m: usize, n: usize, k: usize) -> Self

pub fn new_scaled<S: CubePrimitive>( m: usize, n: usize, k: usize, scale_factor: usize, ) -> Self

pub fn num_elems(&self, ident: MatrixIdent) -> usize

pub fn elems_per_lane(&self, ident: MatrixIdent) -> usize

§Note

pub fn vectors_per_lane(&self, ident: MatrixIdent) -> usize

§Note

pub fn vector_layout(&self, ident: MatrixIdent) -> MatrixLayout

pub fn vector_size(&self, ident: MatrixIdent) -> VectorSize

pub fn position_of_nth( &self, lane_id: u32, elem_idx: u32, ident: MatrixIdent, ) -> (u32, u32)

§Note

pub fn scales_index(&self, lane_id: u32, ident: MatrixIdent) -> u32

pub fn scales_count(&self) -> usize

pub fn scales_vector_size(&self) -> VectorSize

pub fn load_matrix<E: CubePrimitive, NO: Size>( &self, row: &Slice<E>, ident: MatrixIdent, num_matrices: usize, transpose: bool, ) -> Array<Vector<E::Scalar, NO>> ⓘ

§Constraints:

pub fn load_matrix_inplace<E: Scalar, N: Size>( &self, row: &Slice<E>, fragment: &mut Array<Vector<E, N>>, ident: MatrixIdent, num_matrices: usize, transpose: bool, )

pub fn store_matrix<E: CubePrimitive, N: Size>( &self, row: &mut Slice<E, ReadWrite>, registers: &Array<Vector<E::Scalar, N>>, ident: MatrixIdent, num_matrices: usize, transpose: bool, )

§Constraints:

pub fn execute<NA: Size, NB: Size, NC: Size>( &self, registers_a: &Array<Vector<A, NA>>, registers_b: &Array<Vector<B, NB>>, registers_c: &Array<Vector<CD, NC>>, ) -> Array<Vector<CD, NC>> ⓘ

pub fn execute_inplace<NA: Size, NB: Size, NC: Size>( &self, registers_a: &Array<Vector<A, NA>>, registers_b: &Array<Vector<B, NB>>, registers_c: &mut Array<Vector<CD, NC>>, )

pub fn execute_scaled<S: Scalar, NA: Size, NB: Size, NC: Size, NS: Size>( &self, registers_a: &Array<Vector<A, NA>>, registers_b: &Array<Vector<B, NB>>, registers_c: &Array<Vector<CD, NC>>, scales_a: Vector<S, NS>, scales_b: Vector<S, NS>, ) -> Array<Vector<CD, NC>> ⓘ

pub fn __expand_new( scope: &mut Scope, m: usize, n: usize, k: usize, ) -> <Self as CubeType>::ExpandType

pub fn __expand_new_scaled<S: CubePrimitive>( scope: &mut Scope, m: usize, n: usize, k: usize, scale_factor: usize, ) -> <Self as CubeType>::ExpandType

pub fn __expand_num_elems( scope: &mut Scope, this: &<Self as CubeType>::ExpandType, ident: MatrixIdent, ) -> usize

pub fn __expand_elems_per_lane( scope: &mut Scope, this: &<Self as CubeType>::ExpandType, ident: MatrixIdent, ) -> usize

pub fn __expand_vectors_per_lane( scope: &mut Scope, this: &<Self as CubeType>::ExpandType, ident: MatrixIdent, ) -> usize

pub fn __expand_vector_layout( scope: &mut Scope, this: &<Self as CubeType>::ExpandType, ident: MatrixIdent, ) -> MatrixLayout

pub fn __expand_vector_size( scope: &mut Scope, this: &<Self as CubeType>::ExpandType, ident: MatrixIdent, ) -> VectorSize

pub fn __expand_position_of_nth( scope: &mut Scope, this: &<Self as CubeType>::ExpandType, lane_id: <u32 as CubeType>::ExpandType, elem_idx: <u32 as CubeType>::ExpandType, ident: MatrixIdent, ) -> <(u32, u32) as CubeType>::ExpandType

pub fn __expand_scales_index( scope: &mut Scope, this: &<Self as CubeType>::ExpandType, lane_id: <u32 as CubeType>::ExpandType, ident: MatrixIdent, ) -> <u32 as CubeType>::ExpandType

pub fn __expand_scales_count( scope: &mut Scope, this: &<Self as CubeType>::ExpandType, ) -> usize

pub fn __expand_scales_vector_size( scope: &mut Scope, this: &<Self as CubeType>::ExpandType, ) -> VectorSize

pub fn __expand_load_matrix<E: CubePrimitive, NO: Size>( scope: &mut Scope, this: &<Self as CubeType>::ExpandType, row: <Slice<E> as CubeType>::ExpandType, ident: MatrixIdent, num_matrices: usize, transpose: bool, ) -> <Array<Vector<E::Scalar, NO>> as CubeType>::ExpandType ⓘ

Trait Implementations§

impl<A: Clone + CubeType, B: Clone + CubeType, CD: Clone + CubeType> Clone for MmaDefinition<A, B, CD>

fn clone(&self) -> MmaDefinition<A, B, CD>

fn clone_from(&mut self, source: &Self)

impl<A: CubeType, B: CubeType, CD: CubeType> CubeType for MmaDefinition<A, B, CD>

type ExpandType = MmaDefinitionExpand<A, B, CD>

impl<A: Copy + CubeType, B: Copy + CubeType, CD: Copy + CubeType> Copy for MmaDefinition<A, B, CD>

Auto Trait Implementations§

impl<A, B, CD> Freeze for MmaDefinition<A, B, CD>

impl<A, B, CD> RefUnwindSafe for MmaDefinition<A, B, CD>where A: RefUnwindSafe, B: RefUnwindSafe, CD: RefUnwindSafe,

impl<A, B, CD> Send for MmaDefinition<A, B, CD>where A: Send, B: Send, CD: Send,

impl<A, B, CD> Sync for MmaDefinition<A, B, CD>where A: Sync, B: Sync, CD: Sync,

impl<A, B, CD> Unpin for MmaDefinition<A, B, CD>where A: Unpin, B: Unpin, CD: Unpin,

impl<A, B, CD> UnsafeUnpin for MmaDefinition<A, B, CD>

impl<A, B, CD> UnwindSafe for MmaDefinition<A, B, CD>where A: UnwindSafe, B: UnwindSafe, CD: UnwindSafe,

Blanket Implementations§

impl<T> Any for Twhere T: 'static + ?Sized,

fn type_id(&self) -> TypeId

impl<T> Borrow<T> for Twhere T: ?Sized,

fn borrow(&self) -> &T

impl<T> BorrowMut<T> for Twhere T: ?Sized,

fn borrow_mut(&mut self) -> &mut T

impl<C> CloneExpand for Cwhere C: Clone,

fn __expand_clone_method(&self, _scope: &mut Scope) -> C

impl<T> CloneToUninit for Twhere T: Clone,

unsafe fn clone_to_uninit(&self, dest: *mut u8)

impl<T> From<T> for T

fn from(t: T) -> T

impl<T, U> Into<U> for Twhere U: From<T>,

fn into(self) -> U

impl<T> IntoComptime for T

fn comptime(self) -> Self

impl<T> ToOwned for Twhere T: Clone,

type Owned = T

fn to_owned(&self) -> T

fn clone_into(&self, target: &mut T)

impl<T, U> TryFrom<U> for Twhere U: Into<T>,

type Error = Infallible

fn try_from(value: U) -> Result<T, <T as TryFrom<U>>::Error>

impl<T, U> TryInto<U> for Twhere U: TryFrom<T>,

type Error = <U as TryFrom<T>>::Error

fn try_into(self) -> Result<U, <U as TryFrom<T>>::Error>

impl<T> TuneInputs for Twhere T: Clone + Send + Sync + 'static,

Struct MmaDefinition

impl<A, B, CD> RefUnwindSafe for MmaDefinition<A, B, CD>
where A: RefUnwindSafe, B: RefUnwindSafe, CD: RefUnwindSafe,

impl<A, B, CD> Send for MmaDefinition<A, B, CD>
where A: Send, B: Send, CD: Send,

impl<A, B, CD> Sync for MmaDefinition<A, B, CD>
where A: Sync, B: Sync, CD: Sync,

impl<A, B, CD> Unpin for MmaDefinition<A, B, CD>
where A: Unpin, B: Unpin, CD: Unpin,

impl<A, B, CD> UnwindSafe for MmaDefinition<A, B, CD>
where A: UnwindSafe, B: UnwindSafe, CD: UnwindSafe,

impl<T> Any for T
where T: 'static + ?Sized,

impl<T> Borrow<T> for T
where T: ?Sized,

impl<T> BorrowMut<T> for T
where T: ?Sized,

impl<C> CloneExpand for C
where C: Clone,

impl<T> CloneToUninit for T
where T: Clone,

impl<T, U> Into<U> for T
where U: From<T>,

impl<T> ToOwned for T
where T: Clone,

impl<T, U> TryFrom<U> for T
where U: Into<T>,

impl<T, U> TryInto<U> for T
where U: TryFrom<T>,

impl<T> TuneInputs for T
where T: Clone + Send + Sync + 'static,