Module frontend

Source
Expand description

Cube Frontend Types.

Re-exports§

pub use branch::RangeExpand;
pub use branch::SteppedRangeExpand;
pub use branch::range;
pub use branch::range;
pub use branch::range_stepped;
pub use branch::range_stepped;

Modules§

ABSOLUTE_POS
The position of the working unit in the whole cube kernel, without regards to cubes and axis.
ABSOLUTE_POS_X
The index of the working unit in the whole cube kernel along the X axis, without regards to cubes.
ABSOLUTE_POS_Y
The index of the working unit in the whole cube kernel along the Y axis, without regards to cubes.
ABSOLUTE_POS_Z
The index of the working unit in the whole cube kernel along the Z axis, without regards to cubes.
CUBE_CLUSTER_DIM
The total amount of cubes in a cluster.
CUBE_CLUSTER_DIM_X
The dimension of the cluster along the X axis.
CUBE_CLUSTER_DIM_Y
The dimension of the cluster along the Y axis.
CUBE_CLUSTER_DIM_Z
The dimension of the cluster along the Z axis.
CUBE_COUNT
The number of cubes launched.
CUBE_COUNT_X
The number of cubes launched along the X axis.
CUBE_COUNT_Y
The number of cubes launched along the Y axis.
CUBE_COUNT_Z
The number of cubes launched along the Z axis.
CUBE_DIM
The total amount of working units in a cube.
CUBE_DIM_X
The dimension of the cube along the X axis.
CUBE_DIM_Y
The dimension of the cube along the Y axis.
CUBE_DIM_Z
The dimension of the cube along the Z axis.
CUBE_POS
The cube position, without regards to axis.
CUBE_POS_CLUSTER
The cube position within the cluster.
CUBE_POS_CLUSTER_X
The cube position in the cluster along the X axis.
CUBE_POS_CLUSTER_Y
The cube position in the cluster along the Y axis.
CUBE_POS_CLUSTER_Z
The cube position in the cluster along the Z axis.
CUBE_POS_X
The cube position along the X axis.
CUBE_POS_Y
The cube position along the Y axis.
CUBE_POS_Z
The cube position along the Z axis.
PLANE_DIM
The total amount of working units in a plane.
UNIT_POS
The position of the working unit inside the cube, without regards to axis.
UNIT_POS_PLANE
The relative position of the working unit inside the plane, without regards to cube dimensions.
UNIT_POS_X
The position of the working unit inside the cube along the X axis.
UNIT_POS_Y
The position of the working unit inside the cube along the Y axis.
UNIT_POS_Z
The position of the working unit inside the cube along the Z axis.
add
add_assign
add_assign_array_op
add_assign_op
and
assign
barrier
This module exposes barrier for asynchronous data transfer
bitand
bitand_assign_array_op
bitand_assign_op
bitor
bitor_assign_array_op
bitor_assign_op
bitxor
bitxor_assign_array_op
bitxor_assign_op
branch
cast
cmma
This module exposes cooperative matrix-multiply and accumulate operations.
comptime_error
copy_bulk
cube_comment
div
div_assign_array_op
div_assign_op
eq
erf
ge
gt
index
index_assign
le
lt
mul
mul_assign_array_op
mul_assign_op
ne
neg
not
or
pipeline
This module exposes pipelining utilities for multi-stage asynchronous data copies with latency hiding. We call producers all threads that call producer_acquire and producer_commit, and consumers threads that call consumer_wait and consumer_release.
plane_all
Module containing the expand function for plane_all().
plane_any
Module containing the expand function for plane_any().
plane_ballot
Module containing the expand function for plane_ballot().
plane_broadcast
Module containing the expand function for plane_broadcast().
plane_elect
Module containing the expand function for plane_elect().
plane_exclusive_prod
Module containing the expand function for plane_exclusive_prod().
plane_exclusive_sum
Module containing the expand function for plane_exclusive_sum().
plane_inclusive_prod
Module containing the expand function for plane_inclusive_prod().
plane_inclusive_sum
Module containing the expand function for plane_inclusive_sum().
plane_max
Module containing the expand function for plane_max().
plane_min
Module containing the expand function for plane_min().
plane_prod
Module containing the expand function for plane_prod().
plane_sum
Module containing the expand function for plane_sum().
rem
rem_assign_array_op
rem_assign_op
select
select_many
set_polyfill
Expand module of set_polyfill().
shl
shl_assign_array_op
shl_assign_op
shr
shr_assign_array_op
shr_assign_op
sub
sub_assign_array_op
sub_assign_op
synchronization
tma_group_commit
tma_group_wait
tma_group_wait_read
tma_store_2d
tma_store_3d
tma_store_4d
tma_store_5d

Macros§

debug_print
Print a formatted message using the target’s debug print facilities. The format string is target specific, but Vulkan and CUDA both use the C++ conventions. WGSL isn’t currently supported.
debug_print_expand
Print a formatted message using the target’s debug print facilities. The format string is target specific, but Vulkan and CUDA both use the C++ conventions. WGSL isn’t currently supported.

Structs§

Array
A contiguous array of elements.
ArrayCompilationArg
ArrayHandleRef
Tensor representation with a reference to the server handle.
Atomic
An atomic numerical type wrapping a normal numeric primitive. Enables the use of atomic operations, while disabling normal operations. In WGSL, this is a separate type - on CUDA/SPIR-V it can theoretically be bitcast to a normal number, but this isn’t recommended.
ComptimeCell
A cell that can store and mutate cube type types during comptime.
ComptimeCellExpand
Expand type of ComptimeCell.
ExpandElementTyped
Expand type associated with a type.
FastMath
Unchecked optimizations for float operations. May cause precision differences, or undefined behaviour if the relevant conditions are not followed.
FloatExpand
IntExpand
Line
A contiguous list of elements that supports auto-vectorized operations.
Registry
It is similar to a map, but where the keys are stored at comptime, but the values can be runtime variables.
ScalarArg
Sequence
A sequence of cube types that is inlined during compilation.
SequenceArg
SequenceCompilationArg
SequenceExpand
Expand type of Sequence.
SharedMemory
Slice
A read-only contiguous list of elements
SliceMut
A read-write contiguous list of elements.
Tensor
The tensor type is similar to the array type, however it comes with more metadata such as stride and shape.
TensorCompilationArg
Compilation argument for a tensor.
TensorHandleRef
Tensor representation with a reference to the server handle, the strides and the shape.
TensorMap
A CUDA CUtensorMap object. Represents a tensor encoded with a lot of metadata, and is an opaque packed object at runtime. Does not support retrieving any shapes or strides, nor does it give access to the pointer. So these need to be passed separately in an aliased Tensor if needed.
TensorMapArg
Grid constant tensor map, currently only maps to CUDA tensormap. May be interleaved or swizzled, but last dimension must be contiguous (since strides don’t include the last dimension).
TensorMapCompilationArg
Compilation argument for a tensor map.

Enums§

ArrayArg
OobFill
What value to use when filling out of bounds values
TensorArg
Argument to be used for tensors passed as arguments to kernels.
TensorMapFormat
Format of [TensorMap]
TensorMapInterleave
Interleave setting for [TensorMap]
TensorMapPrefetch
Additional prefetching to perform during load Specifies L2 fetch size which indicates the byte granularity at which L2 requests are filled from DRAM
TensorMapSwizzle
Data are organized in a specific order in global memory; however, this may not match the order in which the application accesses data in shared memory. This difference in data organization may cause bank conflicts when shared memory is accessed. In order to avoid this problem, data can be loaded to shared memory with shuffling across shared memory banks. When interleave is TensorMapInterleave::B32, swizzle must be TensorMapSwizzle::B32. Other interleave modes can have any swizzling pattern.

Constants§

ABSOLUTE_POS
The position of the working unit in the whole cube kernel, without regards to cubes and axis.
ABSOLUTE_POS_X
The index of the working unit in the whole cube kernel along the X axis, without regards to cubes.
ABSOLUTE_POS_Y
The index of the working unit in the whole cube kernel along the Y axis, without regards to cubes.
ABSOLUTE_POS_Z
The index of the working unit in the whole cube kernel along the Z axis, without regards to cubes.
CUBE_CLUSTER_DIM
The total amount of cubes in a cluster.
CUBE_CLUSTER_DIM_X
The dimension of the cluster along the X axis.
CUBE_CLUSTER_DIM_Y
The dimension of the cluster along the Y axis.
CUBE_CLUSTER_DIM_Z
The dimension of the cluster along the Z axis.
CUBE_COUNT
The number of cubes launched.
CUBE_COUNT_X
The number of cubes launched along the X axis.
CUBE_COUNT_Y
The number of cubes launched along the Y axis.
CUBE_COUNT_Z
The number of cubes launched along the Z axis.
CUBE_DIM
The total amount of working units in a cube.
CUBE_DIM_X
The dimension of the cube along the X axis.
CUBE_DIM_Y
The dimension of the cube along the Y axis.
CUBE_DIM_Z
The dimension of the cube along the Z axis.
CUBE_POS
The cube position, without regards to axis.
CUBE_POS_CLUSTER
The cube position within the cluster.
CUBE_POS_CLUSTER_X
The cube position in the cluster along the X axis.
CUBE_POS_CLUSTER_Y
The cube position in the cluster along the Y axis.
CUBE_POS_CLUSTER_Z
The cube position in the cluster along the Z axis.
CUBE_POS_X
The cube position along the X axis.
CUBE_POS_Y
The cube position along the Y axis.
CUBE_POS_Z
The cube position along the Z axis.
PLANE_DIM
The total amount of working units in a plane.
UNIT_POS
The position of the working unit inside the cube, without regards to axis.
UNIT_POS_PLANE
The relative position of the working unit inside the plane, without regards to cube dimensions.
UNIT_POS_X
The position of the working unit inside the cube along the X axis.
UNIT_POS_Y
The position of the working unit inside the cube along the Y axis.
UNIT_POS_Z
The position of the working unit inside the cube along the Z axis.

Traits§

Abs
ArgSettings
Defines the argument settings used to launch a kernel.
BitwiseNot
BoolOps
Extension trait for bool.
Cast
Enable elegant casting from any to any CubeElem
Ceil
Clamp
CompilationArg
Argument used during the compilation of kernels.
Cos
CountOnes
CubeComptime
A type that can be used as a kernel comptime argument. Note that a type doesn’t need to implement CubeComptime to be used as a comptime argument. However, this facilitate the declaration of generic cube types.
CubeDebug
CubeIndex
Fake indexation so we can rewrite indexes into scalars as calls to this fake function in the non-expanded function
CubeIndexMut
CubeLaunch
A CubeType that can be used as a kernel argument such as [Array] or [Tensor].
CubePrimitive
Form of CubeType that encapsulates all primitive types: Numeric, UInt, Bool
CubeType
Types used in a cube function must implement this trait
Dot
Erf
Exp
ExpandElementBaseInit
FindFirstSet
Float
Floating point numbers. Used as input in float kernels
Floor
Index
Init
Trait to be implemented by cube types implementations.
Int
Signed or unsigned integer. Used as input in int kernels
IntoRuntime
Trait useful to convert a comptime value into runtime value.
LaunchArg
Defines a type that can be used as argument to a kernel.
LaunchArgExpand
Defines how a launch argument can be expanded.
LeadingZeros
List
Type from which we can read values in cube functions. For a mutable version, see ListMut.
ListExpand
Expand version of [CubeRead].
ListMut
Type for which we can read and write values in cube functions. For an immutable version, see List.
ListMutExpand
Expand version of [CubeWrite].
Log
Log1p
Magnitude
Max
Min
MulHi
Normalize
Numeric
Type that encompasses both (unsigned or signed) integers and floats Used in kernels that should work for both.
OptionExt
Powf
Recip
RegistryQuery
To find an item from the registry, the query must be able to be translated to the actual key type.
Reinterpret
Enables reinterpetring the bits from any value to any other type of the same size.
Remainder
ReverseBits
Round
ScalarArgSettings
Similar to ArgSettings, however only for scalar types that don’t depend on the Runtime trait.
Sin
SizedContainer
SliceOperator
SliceOperatorExpand
Sqrt
Tanh

Functions§

array_assign_binary_op_expand
copy_bulk
Bulk copy length elements between two array-likes without intermediates.
debug_call_expand
Calls a function and inserts debug symbols if debug is enabled.
debug_source_expand
Adds source instruction if debug is enabled
debug_var_expand
Registers name for an expand if possible
erf
expand_checked_index_assign
expand_erf
expand_himul_64
expand_himul_sim
fma
Fused multiply-add A*B+C.
fma_expand
Expand method of fma.
init_expand
plane_all
Perform a reduce all operation across all units in a plane.
plane_any
Perform a reduce any operation across all units in a plane.
plane_ballot
Perform a ballot operation across all units in a plane. Returns a set of 32-bit bitfields as a Line, with each element containing the value from 32 invocations. Note that line size will always be set to 4 even for PLANE_DIM <= 64, because we can’t retrieve the actual plane size at expand time. Use the runtimePLANE_DIM to index appropriately.
plane_broadcast
Broadcasts the value from the specified plane unit at the given index to all active units within that plane.
plane_elect
Returns true if the cube unit has the lowest plane_unit_id among active unit in the plane
plane_exclusive_prod
Perform an exclusive product operation across all units in a plane. This multiplies all values to the “left” of the unit, excluding this unit’s value. The 0th unit will be set to E::one(). Also known as “exclusive prefix product” or “exclusive scan”.
plane_exclusive_sum
Perform an exclusive sum operation across all units in a plane. This sums all values to the “left” of the unit, excluding this unit’s value. The 0th unit will be set to E::zero(). Also known as “exclusive prefix sum” or “exclusive scan”.
plane_inclusive_prod
Perform an inclusive product operation across all units in a plane. This multiplies all values to the “left” of the unit, including this unit’s value. Also known as “prefix product” or “inclusive scan”.
plane_inclusive_sum
Perform an inclusive sum operation across all units in a plane. This sums all values to the “left” of the unit, including this unit’s value. Also known as “prefix sum” or “inclusive scan”.
plane_max
Perform a reduce max operation across all units in a plane.
plane_min
Perform a reduce min operation across all units in a plane.
plane_prod
Perform a reduce prod operation across all units in a plane.
plane_sum
Perform a reduce sum operation across all units in a plane.
printf_expand
Prints a formatted message using the print debug layer in Vulkan, or printf in CUDA.
select
Executes both branches, then selects a value based on the condition. This should be branchless, but might depend on the compiler.
select_many
Same as select() but with lines instead.
set_polyfill
Change the meaning of the given cube primitive type during compilation.
slice_expand
spanned_expand
Calls an intrinsic op and inserts debug symbols if debug is enabled.
tma_group_commit
Commit an async tensor operation. Not sure how this works, poor docs. But you need to call it after a write, but not after reads.
tma_group_wait
Wait until at most max_pending TMA copy operations are in flight.
tma_group_wait_read
Wait TMA copy operations have finished reading from shared memory, with at most max_pending operations being unfinished.
tma_store_2d
Copy a tile from a shared memory src to a global memory dst, with the provided offsets. Should be combined with [memcpy_async_tensor_commit] and [memcpy_async_tensor_wait_read].
tma_store_3d
Copy a tile from a shared memory src to a global memory dst, with the provided offsets. Should be combined with [memcpy_async_tensor_commit] and [memcpy_async_tensor_wait_read].
tma_store_4d
Copy a tile from a shared memory src to a global memory dst, with the provided offsets. Should be combined with [memcpy_async_tensor_commit] and [memcpy_async_tensor_wait_read].
tma_store_5d
Copy a tile from a shared memory src to a global memory dst, with the provided offsets. Should be combined with [memcpy_async_tensor_commit] and [memcpy_async_tensor_wait_read].
unary_expand
unary_expand_fixed_output

Type Aliases§

NumericExpand