pub struct Shared<E: CubePrimitive> { /* private fields */ }Implementations§
pub fn new() -> Self
pub fn __expand_new(scope: &mut Scope) -> <Self as CubeType>::ExpandType
pub fn __expand_default(scope: &mut Scope) -> <Self as CubeType>::ExpandType
pub fn new_lined(line_size: u32) -> SharedMemory<Line<T>>
pub fn __expand_new_lined( scope: &mut Scope, line_size: u32, ) -> <SharedMemory<Line<T>> as CubeType>::ExpandType
Methods from Deref<Target = Barrier>§
Sourcepub fn tma_load_1d<C: CubePrimitive>(
&self,
source: &TensorMap<C>,
destination: &mut SliceMut<Line<C>>,
x: i32,
)
pub fn tma_load_1d<C: CubePrimitive>( &self, source: &TensorMap<C>, destination: &mut SliceMut<Line<C>>, x: i32, )
Copy a tile from a global memory source to a shared memory destination, with
the provided offsets.
Sourcepub fn tma_load_2d<C: CubePrimitive>(
&self,
source: &TensorMap<C>,
destination: &mut SliceMut<Line<C>>,
y: i32,
x: i32,
)
pub fn tma_load_2d<C: CubePrimitive>( &self, source: &TensorMap<C>, destination: &mut SliceMut<Line<C>>, y: i32, x: i32, )
Copy a tile from a global memory source to a shared memory destination, with
the provided offsets.
Sourcepub fn tma_load_3d<C: CubePrimitive>(
&self,
source: &TensorMap<C>,
destination: &mut SliceMut<Line<C>>,
z: i32,
y: i32,
x: i32,
)
pub fn tma_load_3d<C: CubePrimitive>( &self, source: &TensorMap<C>, destination: &mut SliceMut<Line<C>>, z: i32, y: i32, x: i32, )
Copy a tile from a global memory source to a shared memory destination, with
the provided offsets.
Sourcepub fn tma_load_4d<C: CubePrimitive>(
&self,
source: &TensorMap<C>,
destination: &mut SliceMut<Line<C>>,
w: i32,
z: i32,
y: i32,
x: i32,
)
pub fn tma_load_4d<C: CubePrimitive>( &self, source: &TensorMap<C>, destination: &mut SliceMut<Line<C>>, w: i32, z: i32, y: i32, x: i32, )
Copy a tile from a global memory source to a shared memory destination, with
the provided offsets.
Sourcepub fn tma_load_5d<C: CubePrimitive>(
&self,
source: &TensorMap<C>,
destination: &mut SliceMut<Line<C>>,
v: i32,
w: i32,
z: i32,
y: i32,
x: i32,
)
pub fn tma_load_5d<C: CubePrimitive>( &self, source: &TensorMap<C>, destination: &mut SliceMut<Line<C>>, v: i32, w: i32, z: i32, y: i32, x: i32, )
Copy a tile from a global memory source to a shared memory destination, with
the provided offsets.
Sourcepub fn tma_load_im2col_3d<C: CubePrimitive>(
&self,
source: &TensorMap<C>,
destination: &mut SliceMut<Line<C>>,
n: i32,
w: i32,
c: i32,
w_offset: u16,
)
pub fn tma_load_im2col_3d<C: CubePrimitive>( &self, source: &TensorMap<C>, destination: &mut SliceMut<Line<C>>, n: i32, w: i32, c: i32, w_offset: u16, )
Copy a tile from a global memory source to a shared memory destination, with
the provided offsets.
Sourcepub fn tma_load_im2col_4d<C: CubePrimitive>(
&self,
source: &TensorMap<C>,
destination: &mut SliceMut<Line<C>>,
n: i32,
h: i32,
w: i32,
c: i32,
h_offset: u16,
w_offset: u16,
)
pub fn tma_load_im2col_4d<C: CubePrimitive>( &self, source: &TensorMap<C>, destination: &mut SliceMut<Line<C>>, n: i32, h: i32, w: i32, c: i32, h_offset: u16, w_offset: u16, )
Copy a tile from a global memory source to a shared memory destination, with
the provided offsets.
Sourcepub fn tma_load_im2col_5d<C: CubePrimitive>(
&self,
source: &TensorMap<C>,
destination: &mut SliceMut<Line<C>>,
n: i32,
d: i32,
h: i32,
w: i32,
c: i32,
d_offset: u16,
h_offset: u16,
w_offset: u16,
)
pub fn tma_load_im2col_5d<C: CubePrimitive>( &self, source: &TensorMap<C>, destination: &mut SliceMut<Line<C>>, n: i32, d: i32, h: i32, w: i32, c: i32, d_offset: u16, h_offset: u16, w_offset: u16, )
Copy a tile from a global memory source to a shared memory destination, with
the provided offsets.
Sourcepub fn init_manual(&self, arrival_count: u32)
pub fn init_manual(&self, arrival_count: u32)
Initializes a barrier with a given arrival_count. This is the number of
times arrive or one of its variants needs to be called before the barrier advances.
If all units in the cube arrive on the barrier, use CUBE_DIM as the arrival count. For
other purposes, only a subset may need to arrive.
§Note
No synchronization or election is performed, this is raw initialization. For shared barriers
ensure only one unit performs the initialization, and synchronize the cube afterwards. There
may also be additional synchronization requirements for bulk copy operations, like
sync_async_proxy_shared().
Sourcepub fn memcpy_async<C: CubePrimitive>(
&self,
source: &Slice<Line<C>>,
destination: &mut SliceMut<Line<C>>,
)
pub fn memcpy_async<C: CubePrimitive>( &self, source: &Slice<Line<C>>, destination: &mut SliceMut<Line<C>>, )
Copy the source slice to destination
§Safety
This will try to copy the whole source slice, so make sure source length <= destination length
Sourcepub fn memcpy_async_cooperative<C: CubePrimitive>(
&self,
source: &Slice<Line<C>>,
destination: &mut SliceMut<Line<C>>,
)
pub fn memcpy_async_cooperative<C: CubePrimitive>( &self, source: &Slice<Line<C>>, destination: &mut SliceMut<Line<C>>, )
Copy the source slice to destination
§Safety
This will try to copy the whole source slice, so make sure source length <= destination length
Sourcepub fn memcpy_async_tx<C: CubePrimitive>(
&self,
source: &Slice<Line<C>>,
destination: &mut SliceMut<Line<C>>,
)
pub fn memcpy_async_tx<C: CubePrimitive>( &self, source: &Slice<Line<C>>, destination: &mut SliceMut<Line<C>>, )
Copy the source slice to destination. Uses transaction count like TMA, so use with
expect_tx or arrive_and_expect_tx.
§Safety
This will try to copy the whole source slice, so make sure source length <= destination length
Sourcepub fn arrive(&self) -> BarrierToken
pub fn arrive(&self) -> BarrierToken
Arrive at the barrier, decrementing arrival count
Sourcepub fn arrive_and_expect_tx(
&self,
arrival_count: u32,
transaction_count: u32,
) -> BarrierToken
pub fn arrive_and_expect_tx( &self, arrival_count: u32, transaction_count: u32, ) -> BarrierToken
Arrive at the barrier, decrementing arrival count. Additionally increments expected count.
Sourcepub fn arrive_and_wait(&self)
pub fn arrive_and_wait(&self)
Wait until all data is loaded
Sourcepub fn wait(&self, token: BarrierToken)
pub fn wait(&self, token: BarrierToken)
Wait at the barrier until all arrivals are done
Sourcepub fn wait_parity(&self, phase: u32)
pub fn wait_parity(&self, phase: u32)
Wait at the barrier until the phase is completed. Doesn’t require a token, but needs phase
to be managed manually.
Sourcepub fn commit_copy_async(&self)
pub fn commit_copy_async(&self)
Makes all previous copy_async operations visible on the barrier.
Should be called once after all copies have been dispatched, before reading from the shared
memory.
Does not count as an arrive in terms of the barrier arrival count. So arrive or
arrive_and_wait should still be called afterwards.