Struct Shared

Source

pub struct Shared<E: CubePrimitive> { /* private fields */ }

Implementations§

Source §

pub fn __expand_new_lined( scope: &mut Scope, line_size: u32, ) -> <SharedMemory<Line<T>> as CubeType>::ExpandType

Methods from Deref<Target = Barrier>§

Source

pub fn tma_load_1d<C: CubePrimitive>( &self, source: &TensorMap<C>, destination: &mut SliceMut<Line<C>>, x: i32, )

Copy a tile from a global memory source to a shared memory destination, with the provided offsets.

Source

pub fn tma_load_2d<C: CubePrimitive>( &self, source: &TensorMap<C>, destination: &mut SliceMut<Line<C>>, y: i32, x: i32, )

Copy a tile from a global memory source to a shared memory destination, with the provided offsets.

Source

pub fn tma_load_3d<C: CubePrimitive>( &self, source: &TensorMap<C>, destination: &mut SliceMut<Line<C>>, z: i32, y: i32, x: i32, )

Copy a tile from a global memory source to a shared memory destination, with the provided offsets.

Source

pub fn tma_load_4d<C: CubePrimitive>( &self, source: &TensorMap<C>, destination: &mut SliceMut<Line<C>>, w: i32, z: i32, y: i32, x: i32, )

Copy a tile from a global memory source to a shared memory destination, with the provided offsets.

Source

pub fn tma_load_5d<C: CubePrimitive>( &self, source: &TensorMap<C>, destination: &mut SliceMut<Line<C>>, v: i32, w: i32, z: i32, y: i32, x: i32, )

Copy a tile from a global memory source to a shared memory destination, with the provided offsets.

Source

pub fn tma_load_im2col_3d<C: CubePrimitive>( &self, source: &TensorMap<C>, destination: &mut SliceMut<Line<C>>, n: i32, w: i32, c: i32, w_offset: u16, )

Copy a tile from a global memory source to a shared memory destination, with the provided offsets.

Source

pub fn tma_load_im2col_4d<C: CubePrimitive>( &self, source: &TensorMap<C>, destination: &mut SliceMut<Line<C>>, n: i32, h: i32, w: i32, c: i32, h_offset: u16, w_offset: u16, )

Copy a tile from a global memory source to a shared memory destination, with the provided offsets.

Source

pub fn tma_load_im2col_5d<C: CubePrimitive>( &self, source: &TensorMap<C>, destination: &mut SliceMut<Line<C>>, n: i32, d: i32, h: i32, w: i32, c: i32, d_offset: u16, h_offset: u16, w_offset: u16, )

Copy a tile from a global memory source to a shared memory destination, with the provided offsets.

Source

pub fn init_manual(&self, arrival_count: u32)

Initializes a barrier with a given arrival_count. This is the number of times arrive or one of its variants needs to be called before the barrier advances.

If all units in the cube arrive on the barrier, use CUBE_DIM as the arrival count. For other purposes, only a subset may need to arrive.

§Note

No synchronization or election is performed, this is raw initialization. For shared barriers ensure only one unit performs the initialization, and synchronize the cube afterwards. There may also be additional synchronization requirements for bulk copy operations, like sync_async_proxy_shared().

Source

pub fn memcpy_async<C: CubePrimitive>( &self, source: &Slice<Line<C>>, destination: &mut SliceMut<Line<C>>, )

Copy the source slice to destination

§Safety

This will try to copy the whole source slice, so make sure source length <= destination length

Source

pub fn memcpy_async_cooperative<C: CubePrimitive>( &self, source: &Slice<Line<C>>, destination: &mut SliceMut<Line<C>>, )

Copy the source slice to destination

§Safety

This will try to copy the whole source slice, so make sure source length <= destination length

Source

pub fn memcpy_async_tx<C: CubePrimitive>( &self, source: &Slice<Line<C>>, destination: &mut SliceMut<Line<C>>, )

Copy the source slice to destination. Uses transaction count like TMA, so use with expect_tx or arrive_and_expect_tx.

§Safety

This will try to copy the whole source slice, so make sure source length <= destination length

Source

pub fn arrive(&self) -> BarrierToken

Arrive at the barrier, decrementing arrival count

Source

pub fn arrive_and_expect_tx( &self, arrival_count: u32, transaction_count: u32, ) -> BarrierToken

Arrive at the barrier, decrementing arrival count. Additionally increments expected count.

Source

pub fn expect_tx(&self, expected_count: u32)

Increments the expected count of the barrier.

Source

pub fn arrive_and_wait(&self)

Wait until all data is loaded

Source

pub fn wait(&self, token: BarrierToken)

Wait at the barrier until all arrivals are done

Source

pub fn wait_parity(&self, phase: u32)

Wait at the barrier until the phase is completed. Doesn’t require a token, but needs phase to be managed manually.

Source

pub fn commit_copy_async(&self)

Makes all previous copy_async operations visible on the barrier. Should be called once after all copies have been dispatched, before reading from the shared memory.

Does not count as an arrive in terms of the barrier arrival count. So arrive or arrive_and_wait should still be called afterwards.

Trait Implementations§

Source §

impl<T: CubePrimitive> AsMut<T> for Shared<T>

Source §

fn as_mut(&mut self) -> &mut T

Converts this type into a mutable reference of the (usually inferred) input type.

Source §

impl<T: CubePrimitive> AsRef<T> for Shared<T>

Type inference won’t allow things like assign to work normally, so we need to manually call as_ref or as_mut for those. Things like barrier ops should take AsRef so the conversion is automatic.

Source §