Module frontend

Source

Expand description

Cube Frontend Types.

Re-exports§

pub use branch::RangeExpand;
pub use branch::SteppedRangeExpand;
pub use branch::range;
pub use branch::range;
pub use branch::range_stepped;
pub use branch::range_stepped;

Modules§

ABSOLUTE_POS: The position of the working unit in the whole cube kernel, without regards to cubes and axis.
ABSOLUTE_POS_X: The index of the working unit in the whole cube kernel along the X axis, without regards to cubes.
ABSOLUTE_POS_Y: The index of the working unit in the whole cube kernel along the Y axis, without regards to cubes.
ABSOLUTE_POS_Z: The index of the working unit in the whole cube kernel along the Z axis, without regards to cubes.
CUBE_CLUSTER_DIM: The total amount of cubes in a cluster.
CUBE_CLUSTER_DIM_X: The dimension of the cluster along the X axis.
CUBE_CLUSTER_DIM_Y: The dimension of the cluster along the Y axis.
CUBE_CLUSTER_DIM_Z: The dimension of the cluster along the Z axis.
CUBE_COUNT: The number of cubes launched.
CUBE_COUNT_X: The number of cubes launched along the X axis.
CUBE_COUNT_Y: The number of cubes launched along the Y axis.
CUBE_COUNT_Z: The number of cubes launched along the Z axis.
CUBE_DIM: The total amount of working units in a cube.
CUBE_DIM_X: The dimension of the cube along the X axis.
CUBE_DIM_Y: The dimension of the cube along the Y axis.
CUBE_DIM_Z: The dimension of the cube along the Z axis.
CUBE_POS: The cube position, without regards to axis.
CUBE_POS_CLUSTER: The cube position within the cluster.
CUBE_POS_CLUSTER_X: The cube position in the cluster along the X axis.
CUBE_POS_CLUSTER_Y: The cube position in the cluster along the Y axis.
CUBE_POS_CLUSTER_Z: The cube position in the cluster along the Z axis.
CUBE_POS_X: The cube position along the X axis.
CUBE_POS_Y: The cube position along the Y axis.
CUBE_POS_Z: The cube position along the Z axis.
PLANE_DIM: The total amount of working units in a plane.
UNIT_POS: The position of the working unit inside the cube, without regards to axis.
UNIT_POS_PLANE: The relative position of the working unit inside the plane, without regards to cube dimensions.
UNIT_POS_X: The position of the working unit inside the cube along the X axis.
UNIT_POS_Y: The position of the working unit inside the cube along the Y axis.
UNIT_POS_Z: The position of the working unit inside the cube along the Z axis.
add
add_assign
add_assign_array_op
add_assign_op
and
assign
barrier: This module exposes barrier for asynchronous data transfer
bitand
bitand_assign_array_op
bitand_assign_op
bitor
bitor_assign_array_op
bitor_assign_op
bitxor
bitxor_assign_array_op
bitxor_assign_op
branch
cast
cmma: This module exposes cooperative matrix-multiply and accumulate operations.
comptime_error
copy_bulk
cube_comment
div
div_assign_array_op
div_assign_op
div_ceil
eq
erf
ge
gt
index
index_assign
index_unchecked
le
lt
mul
mul_assign_array_op
mul_assign_op
ne
neg
not
or
plane_all: Module containing the expand function for plane_all().
plane_any: Module containing the expand function for plane_any().
plane_ballot: Module containing the expand function for plane_ballot().
plane_broadcast: Module containing the expand function for plane_broadcast().
plane_elect: Module containing the expand function for plane_elect().
plane_exclusive_prod: Module containing the expand function for plane_exclusive_prod().
plane_exclusive_sum: Module containing the expand function for plane_exclusive_sum().
plane_inclusive_prod: Module containing the expand function for plane_inclusive_prod().
plane_inclusive_sum: Module containing the expand function for plane_inclusive_sum().
plane_max: Module containing the expand function for plane_max().
plane_min: Module containing the expand function for plane_min().
plane_prod: Module containing the expand function for plane_prod().
plane_shuffle: Module containing the expand function for plane_shuffle().
plane_shuffle_down: Module containing the expand function for plane_shuffle_down().
plane_shuffle_up: Module containing the expand function for plane_shuffle_up().
plane_shuffle_xor: Module containing the expand function for plane_shuffle_xor().
plane_sum: Module containing the expand function for plane_sum().
rem
rem_assign_array_op
rem_assign_op
select
select_many
set_polyfill: Expand module of set_polyfill().
shl
shl_assign_array_op
shl_assign_op
shr
shr_assign_array_op
shr_assign_op
sub
sub_assign_array_op
sub_assign_op
synchronization
tma_group_commit
tma_group_wait
tma_group_wait_read
tma_store_1d
tma_store_2d
tma_store_3d
tma_store_4d
tma_store_5d

Macros§

debug_print: Print a formatted message using the target’s debug print facilities. The format string is target specific, but Vulkan and CUDA both use the C++ conventions. WGSL isn’t currently supported.
debug_print_expand: Print a formatted message using the target’s debug print facilities. The format string is target specific, but Vulkan and CUDA both use the C++ conventions. WGSL isn’t currently supported.

Structs§

Array: A contiguous array of elements.
ArrayCompilationArg
ArrayHandleRef: Tensor representation with a reference to the server handle.
Atomic: An atomic numerical type wrapping a normal numeric primitive. Enables the use of atomic operations, while disabling normal operations. In WGSL, this is a separate type - on CUDA/SPIR-V it can theoretically be bitcast to a normal number, but this isn’t recommended.
ComptimeCell: A cell that can store and mutate a cube type during comptime.
ComptimeCellExpand: Expand type of ComptimeCell.
ElemExpand: A fake element type that can be configured to map to any other element type.
ExpandElementTyped: Expand type associated with a type.
IntExpand
Line: A contiguous list of elements that supports auto-vectorized operations.
ReadOnly
ReadWrite
Registry: It is similar to a map, but where the keys are stored at comptime, but the values can be runtime variables.
RuntimeCell
RuntimeCellExpand
ScalarArg
ScalarCompilationArg
Sequence: A sequence of cube types that is inlined during compilation.
SequenceArg
SequenceCompilationArg
SequenceExpand: Expand type of Sequence.
SharedMemory
Slice: A read-only contiguous list of elements
SliceExpand
Tensor: The tensor type is similar to the array type, however it comes with more metadata such as stride and shape.
TensorCompilationArg: Compilation argument for a tensor.
TensorHandleRef: Tensor representation with a reference to the server handle, the strides and the shape.
TensorMap: A CUDA CUtensorMap object. Represents a tensor encoded with a lot of metadata, and is an opaque packed object at runtime. Does not support retrieving any shapes or strides, nor does it give access to the pointer. So these need to be passed separately in an aliased Tensor if needed.
TensorMapArg: Grid constant tensor map, currently only maps to CUDA tensormap. May be interleaved or swizzled, but last dimension must be contiguous (since strides don’t include the last dimension).
TensorMapCompilationArg: Compilation argument for a tensor map.

Enums§

ArrayArg
FastMath: Unchecked optimizations for float operations. May cause precision differences, or undefined behaviour if the relevant conditions are not followed.
OobFill: What value to use when filling out of bounds values
SliceOrigin
SliceOriginExpand
TensorArg: Argument to be used for tensors passed as arguments to kernels.
TensorMapFormat: Format of [TensorMap]
TensorMapInterleave: Interleave setting for [TensorMap]
TensorMapPrefetch: Additional prefetching to perform during load Specifies L2 fetch size which indicates the byte granularity at which L2 requests are filled from DRAM
TensorMapSwizzle: Data are organized in a specific order in global memory; however, this may not match the order in which the application accesses data in shared memory. This difference in data organization may cause bank conflicts when shared memory is accessed. In order to avoid this problem, data can be loaded to shared memory with shuffling across shared memory banks. When interleave is TensorMapInterleave::B32, swizzle must be TensorMapSwizzle::B32. Other interleave modes can have any swizzling pattern.

Constants§

ABSOLUTE_POS: The position of the working unit in the whole cube kernel, without regards to cubes and axis.
ABSOLUTE_POS_X: The index of the working unit in the whole cube kernel along the X axis, without regards to cubes.
ABSOLUTE_POS_Y: The index of the working unit in the whole cube kernel along the Y axis, without regards to cubes.
ABSOLUTE_POS_Z: The index of the working unit in the whole cube kernel along the Z axis, without regards to cubes.
CUBE_CLUSTER_DIM: The total amount of cubes in a cluster.
CUBE_CLUSTER_DIM_X: The dimension of the cluster along the X axis.
CUBE_CLUSTER_DIM_Y: The dimension of the cluster along the Y axis.
CUBE_CLUSTER_DIM_Z: The dimension of the cluster along the Z axis.
CUBE_COUNT: The number of cubes launched.
CUBE_COUNT_X: The number of cubes launched along the X axis.
CUBE_COUNT_Y: The number of cubes launched along the Y axis.
CUBE_COUNT_Z: The number of cubes launched along the Z axis.
CUBE_DIM: The total amount of working units in a cube.
CUBE_DIM_X: The dimension of the cube along the X axis.
CUBE_DIM_Y: The dimension of the cube along the Y axis.
CUBE_DIM_Z: The dimension of the cube along the Z axis.
CUBE_POS: The cube position, without regards to axis.
CUBE_POS_CLUSTER: The cube position within the cluster.
CUBE_POS_CLUSTER_X: The cube position in the cluster along the X axis.
CUBE_POS_CLUSTER_Y: The cube position in the cluster along the Y axis.
CUBE_POS_CLUSTER_Z: The cube position in the cluster along the Z axis.
CUBE_POS_X: The cube position along the X axis.
CUBE_POS_Y: The cube position along the Y axis.
CUBE_POS_Z: The cube position along the Z axis.
PLANE_DIM: The total amount of working units in a plane.
UNIT_POS: The position of the working unit inside the cube, without regards to axis.
UNIT_POS_PLANE: The relative position of the working unit inside the plane, without regards to cube dimensions.
UNIT_POS_X: The position of the working unit inside the cube along the X axis.
UNIT_POS_Y: The position of the working unit inside the cube along the Y axis.
UNIT_POS_Z: The position of the working unit inside the cube along the Z axis.

Traits§

Abs
ArgSettings: Defines the argument settings used to launch a kernel.
BitwiseNot
BoolOps: Extension trait for bool.
Cast: Enable elegant casting from any to any CubeElem
Ceil
Clamp
CompilationArg: Argument used during the compilation of kernels.
Cos
CountOnes
CubeComptime: A type that can be used as a kernel comptime argument. Note that a type doesn’t need to implement CubeComptime to be used as a comptime argument. However, this facilitate the declaration of generic cube types.
CubeDebug
CubeIndex: Fake indexation so we can rewrite indexes into scalars as calls to this fake function in the non-expanded function
CubeIndexExpand
CubeIndexMut
CubeIndexMutExpand
CubePrimitive: Form of CubeType that encapsulates all primitive types: Numeric, UInt, Bool
CubeType: Types used in a cube function must implement this trait
Dot
Erf
Exp
ExpandElementIntoMut
FindFirstSet
Float: Floating point numbers. Used as input in float kernels
Floor
Index
Int: Signed or unsigned integer. Used as input in int kernels
IntoMut: Convert an expand type to a version with mutable registers when necessary.
IntoRuntime: Trait useful to convert a comptime value into runtime value.
IsInf
IsNan
LaunchArg: Defines how a launch argument can be expanded.
LeadingZeros
Lined
LinedExpand
List: Type from which we can read values in cube functions. For a mutable version, see ListMut.
ListExpand: Type from which we can read values in cube functions. For a mutable version, see ListMut.
ListMut: Type for which we can read and write values in cube functions. For an immutable version, see List.
ListMutExpand: Type for which we can read and write values in cube functions. For an immutable version, see List.
Log
Log1p
Magnitude
Max
Min
MulHi
Normalize
Numeric: Type that encompasses both (unsigned or signed) integers and floats Used in kernels that should work for both.
OptionExt
Powf
Powi
Recip
RegistryQuery: To find an item from the registry, the query must be able to be translated to the actual key type.
Reinterpret: Enables reinterpetring the bits from any value to any other type of the same size.
Remainder
ReverseBits
Round
SaturatingAdd
SaturatingSub
ScalarArgSettings: Similar to ArgSettings, however only for scalar types that don’t depend on the Runtime trait.
Sin
SizedContainer
SliceMutOperator
SliceMutOperatorExpand
SliceOperator
SliceOperatorExpand
SliceVisibility
Sqrt
Tanh
Trunc

Functions§

array_assign_binary_op_expand
copy_bulk: Bulk copy length elements between two array-likes without intermediates.
debug_call_expand: Calls a function and inserts debug symbols if debug is enabled.
debug_source_expand: Adds source instruction if debug is enabled
debug_var_expand: Registers name for an expand if possible
div_ceil
erf
expand_checked_index_assign
expand_erf
expand_himul_64
expand_himul_sim
fma: Fused multiply-add A*B+C.
fma_expand: Expand method of fma.
init_expand
plane_all: Perform a reduce all operation across all units in a plane.
plane_any: Perform a reduce any operation across all units in a plane.
plane_ballot: Perform a ballot operation across all units in a plane. Returns a set of 32-bit bitfields as a Line, with each element containing the value from 32 invocations. Note that line size will always be set to 4 even for PLANE_DIM <= 64, because we can’t retrieve the actual plane size at expand time. Use the runtimePLANE_DIM to index appropriately.
plane_broadcast: Broadcasts the value from the specified plane unit at the given index to all active units within that plane.
plane_elect: Returns true if the cube unit has the lowest plane_unit_id among active unit in the plane
plane_exclusive_prod: Perform an exclusive product operation across all units in a plane. This multiplies all values to the “left” of the unit, excluding this unit’s value. The 0th unit will be set to E::one(). Also known as “exclusive prefix product” or “exclusive scan”.
plane_exclusive_sum: Perform an exclusive sum operation across all units in a plane. This sums all values to the “left” of the unit, excluding this unit’s value. The 0th unit will be set to E::zero(). Also known as “exclusive prefix sum” or “exclusive scan”.
plane_inclusive_prod: Perform an inclusive product operation across all units in a plane. This multiplies all values to the “left” of the unit, including this unit’s value. Also known as “prefix product” or “inclusive scan”.
plane_inclusive_sum: Perform an inclusive sum operation across all units in a plane. This sums all values to the “left” of the unit, including this unit’s value. Also known as “prefix sum” or “inclusive scan”.
plane_max: Perform a reduce max operation across all units in a plane.
plane_min: Perform a reduce min operation across all units in a plane.
plane_prod: Perform a reduce prod operation across all units in a plane.
plane_shuffle: Perform an arbitrary lane shuffle operation across the plane. Each unit reads the value from the specified source lane.
plane_shuffle_down: Perform a shuffle down operation across the plane. Each unit reads the value from a unit with a higher lane ID (current_id + delta). Units at the end will read from themselves if (lane_id + delta >= plane_dim).
plane_shuffle_up: Perform a shuffle up operation across the plane. Each unit reads the value from a unit with a lower lane ID (current_id - delta). Units with lane_id < delta will read from themselves (no change).
plane_shuffle_xor: Perform a shuffle XOR operation across the plane. Each unit exchanges its value with another unit at an index determined by XOR with the mask. This is useful for butterfly reduction patterns.
plane_sum: Perform a reduce sum operation across all units in a plane.
printf_expand: Prints a formatted message using the print debug layer in Vulkan, or printf in CUDA.
select: Executes both branches, then selects a value based on the condition. This should be branchless, but might depend on the compiler.
select_many: Same as select() but with lines instead.
set_polyfill: Change the meaning of the given cube primitive type during compilation.
spanned_expand: Calls an intrinsic op and inserts debug symbols if debug is enabled.
tma_group_commit: Commit an async tensor operation. Not sure how this works, poor docs. But you need to call it after a write, but not after reads.
tma_group_wait: Wait until at most max_pending TMA copy operations are in flight.
tma_group_wait_read: Wait TMA copy operations have finished reading from shared memory, with at most max_pending operations being unfinished.
tma_store_1d: Copy a tile from a shared memory src to a global memory dst, with the provided offsets. Should be combined with [memcpy_async_tensor_commit] and [memcpy_async_tensor_wait_read].
tma_store_2d: Copy a tile from a shared memory src to a global memory dst, with the provided offsets. Should be combined with [memcpy_async_tensor_commit] and [memcpy_async_tensor_wait_read].
tma_store_3d: Copy a tile from a shared memory src to a global memory dst, with the provided offsets. Should be combined with [memcpy_async_tensor_commit] and [memcpy_async_tensor_wait_read].
tma_store_4d: Copy a tile from a shared memory src to a global memory dst, with the provided offsets. Should be combined with [memcpy_async_tensor_commit] and [memcpy_async_tensor_wait_read].
tma_store_5d: Copy a tile from a shared memory src to a global memory dst, with the provided offsets. Should be combined with [memcpy_async_tensor_commit] and [memcpy_async_tensor_wait_read].
unary_expand
unary_expand_fixed_output

Type Aliases§

FloatExpand: A fake float element type that can be configured to map to any other element type.
NumericExpand: A fake numeric element type that can be configured to map to any other element type.
SliceMut

Module frontend

Module frontend Copy item path

Re-exports§

Modules§

Macros§

Structs§

Enums§

Constants§

Traits§

Functions§

Type Aliases§

Module frontend