Module frontend

Source

Expand description

Cube Frontend Types.

Re-exports§

pub use branch::*;
pub use synchronization::*;

Modules§

ABSOLUTE_POS: The position of the working unit in the whole cube kernel, without regards to cubes and axis.
ABSOLUTE_POS_X: The index of the working unit in the whole cube kernel along the X axis, without regards to cubes.
ABSOLUTE_POS_Y: The index of the working unit in the whole cube kernel along the Y axis, without regards to cubes.
ABSOLUTE_POS_Z: The index of the working unit in the whole cube kernel along the Z axis, without regards to cubes.
CUBE_CLUSTER_DIM: The total amount of cubes in a cluster.
CUBE_CLUSTER_DIM_X: The dimension of the cluster along the X axis.
CUBE_CLUSTER_DIM_Y: The dimension of the cluster along the Y axis.
CUBE_CLUSTER_DIM_Z: The dimension of the cluster along the Z axis.
CUBE_COUNT: The number of cubes launched.
CUBE_COUNT_X: The number of cubes launched along the X axis.
CUBE_COUNT_Y: The number of cubes launched along the Y axis.
CUBE_COUNT_Z: The number of cubes launched along the Z axis.
CUBE_DIM: The total amount of working units in a cube.
CUBE_DIM_X: The dimension of the cube along the X axis.
CUBE_DIM_Y: The dimension of the cube along the Y axis.
CUBE_DIM_Z: The dimension of the cube along the Z axis.
CUBE_POS: The cube position, without regards to axis.
CUBE_POS_CLUSTER: The cube position within the cluster.
CUBE_POS_CLUSTER_X: The cube position in the cluster along the X axis.
CUBE_POS_CLUSTER_Y: The cube position in the cluster along the Y axis.
CUBE_POS_CLUSTER_Z: The cube position in the cluster along the Z axis.
CUBE_POS_X: The cube position along the X axis.
CUBE_POS_Y: The cube position along the Y axis.
CUBE_POS_Z: The cube position along the Z axis.
PLANE_DIM: The total amount of working units in a plane.
PLANE_POS: The position of the plane within the cube (plane/warp/subgroup index).
UNIT_POS: The position of the working unit inside the cube, without regards to axis.
UNIT_POS_PLANE: The relative position of the working unit inside the plane, without regards to cube dimensions.
UNIT_POS_X: The position of the working unit inside the cube along the X axis.
UNIT_POS_Y: The position of the working unit inside the cube along the Y axis.
UNIT_POS_Z: The position of the working unit inside the cube along the Z axis.
add
add_assign
add_assign_array_op
add_assign_op
and
assign
barrier: This module exposes barrier for asynchronous data transfer
bitand
bitand_assign_array_op
bitand_assign_op
bitor
bitor_assign_array_op
bitor_assign_op
bitxor
bitxor_assign_array_op
bitxor_assign_op
branch
cast
clamp
clamp_max
clamp_min
cmma: This module exposes cooperative matrix-multiply and accumulate operations.
comptime: Module containing compile-time information about the current runtime.
comptime_error
copy_bulk
cube_comment
div
div_assign_array_op
div_assign_op
div_ceil
eq
erf
fma: Expand method of fma().
ge
gt
hypot
index
index_assign
index_unchecked
le
lt
max
min
mul
mul_assign_array_op
mul_assign_op
ne
neg
not
or
plane_all: Module containing the expand function for plane_all().
plane_any: Module containing the expand function for plane_any().
plane_ballot: Module containing the expand function for plane_ballot().
plane_broadcast: Module containing the expand function for plane_broadcast().
plane_elect: Module containing the expand function for plane_elect().
plane_exclusive_prod: Module containing the expand function for plane_exclusive_prod().
plane_exclusive_sum: Module containing the expand function for plane_exclusive_sum().
plane_inclusive_prod: Module containing the expand function for plane_inclusive_prod().
plane_inclusive_sum: Module containing the expand function for plane_inclusive_sum().
plane_max: Module containing the expand function for plane_max().
plane_min: Module containing the expand function for plane_min().
plane_prod: Module containing the expand function for plane_prod().
plane_shuffle: Module containing the expand function for plane_shuffle().
plane_shuffle_down: Module containing the expand function for plane_shuffle_down().
plane_shuffle_up: Module containing the expand function for plane_shuffle_up().
plane_shuffle_xor: Module containing the expand function for plane_shuffle_xor().
plane_sum: Module containing the expand function for plane_sum().
push_validation_error
rem
rem_assign_array_op
rem_assign_op
rhypot
select
select_many
set_polyfill: Expand module of set_polyfill().
shl
shl_assign_array_op
shl_assign_op
shr
shr_assign_array_op
shr_assign_op
storage_type_of
sub
sub_assign_array_op
sub_assign_op
synchronization
tma_group_commit
tma_group_wait
tma_group_wait_read
tma_store_1d
tma_store_2d
tma_store_3d
tma_store_4d
tma_store_5d
type_of

Macros§

debug_print: Print a formatted message using the target’s debug print facilities. The format string is target specific, but Vulkan and CUDA both use the C++ conventions. WGSL isn’t currently supported.
debug_print_expand: Print a formatted message using the target’s debug print facilities. The format string is target specific, but Vulkan and CUDA both use the C++ conventions. WGSL isn’t currently supported.
define_scalar: Define a custom type to be used for a comptime scalar type. Useful for cases where generics can’t work.
define_size: Define a custom type to be used for a comptime size. Useful for cases where generics can’t work.

Structs§

Array: A contiguous array of elements.
ArrayBinding: Tensor representation with a reference to the server handle.
ArrayCompilationArg
Atomic: An atomic numerical type wrapping a normal numeric primitive. Enables the use of atomic operations, while disabling normal operations. In WGSL, this is a separate type - on CUDA/SPIR-V it can theoretically be bitcast to a normal number, but this isn’t recommended.
ComptimeCell: A cell that can store and mutate a cube type during comptime.
ComptimeCellExpand: Expand type of ComptimeCell.
Const
DynamicScalar: A fake element type that can be configured to map to any other element type.
DynamicSize: A fake constant type that can be configured to map to any comptime value.
Im2col: Im2col indexing. Loads a “column” (not the same column as im2col) of pixels into shared memory, with a certain offset (kernel position). The corners are the bounds to load pixels from at offset 0, so the top left corner of the kernel. The offset is added to the corner offsets, so a (-1, -1) corner will stop the bounding box at (1, 1) for kernel offset (2, 2).
Im2colArgs: Args for im2col tensor maps
Im2colCompilationArg
Im2colExpand
Im2colLaunch
Im2colWide: 1D im2col, not properly supported yet
Im2colWideArgs: Args for im2col wide tensor maps
Im2colWideCompilationArg
Im2colWideExpand
Im2colWideLaunch
InputScalar: A way to define an input scalar without a generic attached to it.
InputScalarCompilationArg
InputScalarExpand
NativeExpand: Expand type of a native GPU type, i.e. scalar primitives, arrays, shared memory.
OptionExpand
OrderingExpand
ReadOnly
ReadWrite
Registry: It is similar to a map, but where the keys are stored at comptime, but the values can be runtime variables.
RuntimeCell
RuntimeCellExpand
Sequence: A sequence of cube types that is inlined during compilation.
SequenceArg
SequenceCompilationArg
SequenceExpand: Expand type of Sequence.
Shared
SharedMemory
Slice: A read-only contiguous list of elements
SliceExpand
Tensor: The tensor type is similar to the array type, however it comes with more metadata such as stride and shape.
TensorBinding: Tensor representation with a reference to the server handle, the strides and the shape.
TensorCompilationArg: Compilation argument for a tensor.
TensorMap: A CUDA CUtensorMap object. Represents a tensor encoded with a lot of metadata, and is an opaque packed object at runtime. Does not support retrieving any shapes or strides, nor does it give access to the pointer. So these need to be passed separately in an aliased Tensor if needed.
TensorMapArg: Grid constant tensor map, currently only maps to CUDA tensormap. May be interleaved or swizzled, but last dimension must be contiguous (since strides don’t include the last dimension).
Tiled: Regular tiled tensor map
TiledArgs: Args for tiled tensor maps
TiledCompilationArg
TiledExpand
TiledLaunch
Vector: A contiguous list of elements that supports auto-vectorized operations.

Enums§

ArrayArg
ComptimeOption
ComptimeOptionArgs
ComptimeOptionCompilationArg
ComptimeOptionExpand
OobFill: What value to use when filling out of bounds values
OptionArgs
OptionCompilationArg
SliceOrigin
SliceOriginExpand
TensorArg: Argument to be used for tensors passed as arguments to kernels.
TensorMapFormat: Format of TensorMap
TensorMapInterleave: Interleave setting for TensorMap
TensorMapPrefetch: Additional prefetching to perform during load Specifies L2 fetch size which indicates the byte granularity at which L2 requests are filled from DRAM
TensorMapSwizzle: Data are organized in a specific order in global memory; however, this may not match the order in which the application accesses data in shared memory. This difference in data organization may cause bank conflicts when shared memory is accessed. In order to avoid this problem, data can be loaded to shared memory with shuffling across shared memory banks. When interleave is TensorMapInterleave::B32, swizzle must be TensorMapSwizzle::B32. Other interleave modes can have any swizzling pattern.

Constants§

ABSOLUTE_POS: The position of the working unit in the whole cube kernel, without regards to cubes and axis.
ABSOLUTE_POS_X: The index of the working unit in the whole cube kernel along the X axis, without regards to cubes.
ABSOLUTE_POS_Y: The index of the working unit in the whole cube kernel along the Y axis, without regards to cubes.
ABSOLUTE_POS_Z: The index of the working unit in the whole cube kernel along the Z axis, without regards to cubes.
CUBE_CLUSTER_DIM: The total amount of cubes in a cluster.
CUBE_CLUSTER_DIM_X: The dimension of the cluster along the X axis.
CUBE_CLUSTER_DIM_Y: The dimension of the cluster along the Y axis.
CUBE_CLUSTER_DIM_Z: The dimension of the cluster along the Z axis.
CUBE_COUNT: The number of cubes launched.
CUBE_COUNT_X: The number of cubes launched along the X axis.
CUBE_COUNT_Y: The number of cubes launched along the Y axis.
CUBE_COUNT_Z: The number of cubes launched along the Z axis.
CUBE_DIM: The total amount of working units in a cube.
CUBE_DIM_X: The dimension of the cube along the X axis.
CUBE_DIM_Y: The dimension of the cube along the Y axis.
CUBE_DIM_Z: The dimension of the cube along the Z axis.
CUBE_POS: The cube position, without regards to axis.
CUBE_POS_CLUSTER: The cube position within the cluster.
CUBE_POS_CLUSTER_X: The cube position in the cluster along the X axis.
CUBE_POS_CLUSTER_Y: The cube position in the cluster along the Y axis.
CUBE_POS_CLUSTER_Z: The cube position in the cluster along the Z axis.
CUBE_POS_X: The cube position along the X axis.
CUBE_POS_Y: The cube position along the Y axis.
CUBE_POS_Z: The cube position along the Z axis.
PLANE_DIM: The total amount of working units in a plane.
PLANE_POS: The position of the plane within the cube (plane/warp/subgroup index).
UNIT_POS: The position of the working unit inside the cube, without regards to axis.
UNIT_POS_PLANE: The relative position of the working unit inside the plane, without regards to cube dimensions.
UNIT_POS_X: The position of the working unit inside the cube along the X axis.
UNIT_POS_Y: The position of the working unit inside the cube along the Y axis.
UNIT_POS_Z: The position of the working unit inside the cube along the Z axis.

Traits§

Abs
AbsExpand
AddAssignExpand
AddExpand
ArcCos
ArcCosExpand
ArcCosh
ArcCoshExpand
ArcSin
ArcSinExpand
ArcSinh
ArcSinhExpand
ArcTan
ArcTan2
ArcTan2Expand
ArcTanExpand
ArcTanh
ArcTanhExpand
AsMutExpand
AsRefExpand
Assign
BoolOps: Extension trait for bool.
Cast: Enable elegant casting from any to any CubeElem
Ceil
CeilExpand
CloneExpand
CompilationArg: Argument used during the compilation of kernels.
ComptimeIndex: Workaround for comptime indexing, since the helper that replaces index operators doesn’t know about whether a variable is comptime. Has the same signature in unexpanded code, so it will automatically dispatch the correct one.
ComptimeIndexMut
Cos
CosExpand
Cosh
CoshExpand
CountOnes
CountOnesExpand
CubeAdd
CubeAddAssign
CubeComptime: A type that can be used as a kernel comptime argument. Note that a type doesn’t need to implement CubeComptime to be used as a comptime argument. However, this facilitate the declaration of generic cube types.
CubeDebug
CubeDiv
CubeDivAssign
CubeEnum
CubeIndex: Fake indexation so we can rewrite indexes into scalars as calls to this fake function in the non-expanded function
CubeIndexExpand
CubeIndexMut
CubeIndexMutExpand
CubeMul
CubeMulAssign
CubeNot
CubeOption: Extensions for Option
CubeOptionDefault: Extensions for Option that require default
CubeOrd
CubeOrdering
CubePrimitive: Form of CubeType that encapsulates all primitive types: Numeric, UInt, Bool
CubePrimitiveExpand
CubeRem
CubeRemAssign
CubeSub
CubeSubAssign
CubeType: Types used in a cube function must implement this trait
DefaultExpand
Degrees
DegreesExpand
DivAssignExpand
DivCeil
DivCeilExpand
DivExpand
Dot
DotExpand
Erf
ErfExpand
Exp
ExpExpand
FindFirstSet
FindFirstSetExpand
Float: Floating point numbers. Used as input in float kernels
FloatBits
FloatBitsExpand
FloatOps
FloatOpsExpand
Floor
FloorExpand
Hypot
HypotExpand
Int: Signed or unsigned integer. Used as input in int kernels
IntoComptime: Trait for marking a function return value as comptime when the compiler can’t infer it.
IntoMut: Convert an expand type to a version with mutable registers when necessary.
IntoRuntime: Trait useful to convert a comptime value into runtime value.
InverseSqrt
InverseSqrtExpand
IsInf
IsInfExpand
IsNan
IsNanExpand
LaunchArg: Defines how a launch argument can be expanded.
LeadingZeros
LeadingZerosExpand
List: Type from which we can read values in cube functions. For a mutable version, see ListMut.
ListExpand: Type from which we can read values in cube functions. For a mutable version, see ListMut.
ListMut: Type for which we can read and write values in cube functions. For an immutable version, see List.
ListMutExpand: Type for which we can read and write values in cube functions. For an immutable version, see List.
Log
Log1p
Log1pExpand
LogExpand
Magnitude
MagnitudeExpand
MulAssignExpand
MulExpand
MulHi
MulHiExpand
NativeAssign: Trait for native types that can be assigned. For non-native composites, use the normal Assign.
Normalize
NormalizeExpand
NotExpand
Numeric: Type that encompasses both (unsigned or signed) integers and floats Used in kernels that should work for both.
OneExpand
OptionExt
OrdExpand
Powf
PowfExpand
Powi
PowiExpand
Radians
RadiansExpand
Recip
RecipExpand
RegistryQuery: To find an item from the registry, the query must be able to be translated to the actual key type.
Reinterpret: Enables reinterpetring the bits from any value to any other type of the same size.
RemAssignExpand
RemExpand
Remainder
RemainderExpand
ReverseBits
ReverseBitsExpand
Rhypot
RhypotExpand
Round
RoundExpand
SaturatingAdd
SaturatingAddExpand
SaturatingSub
SaturatingSubExpand
Scalar: Marker trait for scalar primitives. Should be implemented for all scalar CubePrimitives, but not for Vector or non-standard primitives like Barrier. Alternatively, treat these as types that can be stored in a [Vector]
ScalarArgSettings: Similar to [ArgSettings], however only for scalar types that don’t depend on the Runtime trait.
Sin
SinExpand
Sinh
SinhExpand
Size
SizedContainer
SliceMutOperator
SliceMutOperatorExpand
SliceOperator
SliceOperatorExpand
SliceVisibility
Sqrt
SqrtExpand
SubAssignExpand
SubExpand
Tan
TanExpand
Tanh
TanhExpand
TensorMapKind
TrailingZeros
TrailingZerosExpand
Trunc
TruncExpand
VectorSum
VectorSumExpand
Vectorized
VectorizedExpand
ZeroExpand

Functions§

array_assign_binary_op_expand
copy_bulk: Bulk copy length elements between two array-likes without intermediates.
debug_call_expand: Calls a function and inserts debug symbols if debug is enabled.
debug_source_expand: Adds source instruction if debug is enabled
debug_var_expand: Registers name for an expand if possible
div_ceil
erf
expand_erf
expand_himul_64
expand_himul_sim
expand_hypot
expand_rhypot
fast_math_expand
fma: Fused multiply-add A*B+C.
hypot: Computes the hypotenuse of a right triangle given the lengths of the other two sides.
init_expand
into_mut_assign
max: The maximum of two values, not requiring Ord. Provided for clarity in certain cases, though clamp_min may sometimes be more clear.
min: The minimum of two values, not requiring Ord. Provided for clarity in certain cases, though clamp_max may sometimes be more clear.
plane_all: Perform a reduce all operation across all units in a plane.
plane_any: Perform a reduce any operation across all units in a plane.
plane_ballot: Perform a ballot operation across all units in a plane. Returns a set of 32-bit bitfields as a Vector, with each element containing the value from 32 invocations. Note that vector size will always be set to 4 even for PLANE_DIM <= 64, because we can’t retrieve the actual plane size at expand time. Use the runtimePLANE_DIM to index appropriately.
plane_broadcast: Broadcasts the value from the specified plane unit at the given index to all active units within that plane. Requires a constant index. For non-constant indices, use plane_shuffle().
plane_elect: Returns true if the cube unit has the lowest plane_unit_id among active unit in the plane
plane_exclusive_prod: Perform an exclusive product operation across all units in a plane. This multiplies all values to the “left” of the unit, excluding this unit’s value. The 0th unit will be set to E::one(). Also known as “exclusive prefix product” or “exclusive scan”.
plane_exclusive_sum: Perform an exclusive sum operation across all units in a plane. This sums all values to the “left” of the unit, excluding this unit’s value. The 0th unit will be set to E::zero(). Also known as “exclusive prefix sum” or “exclusive scan”.
plane_inclusive_prod: Perform an inclusive product operation across all units in a plane. This multiplies all values to the “left” of the unit, including this unit’s value. Also known as “prefix product” or “inclusive scan”.
plane_inclusive_sum: Perform an inclusive sum operation across all units in a plane. This sums all values to the “left” of the unit, including this unit’s value. Also known as “prefix sum” or “inclusive scan”.
plane_max: Perform a reduce max operation across all units in a plane.
plane_min: Perform a reduce min operation across all units in a plane.
plane_prod: Perform a reduce prod operation across all units in a plane.
plane_shuffle: Perform an arbitrary lane shuffle operation across the plane. Each unit reads the value from the specified source lane.
plane_shuffle_down: Perform a shuffle down operation across the plane. Each unit reads the value from a unit with a higher lane ID (current_id + delta). Units at the end will read from themselves if (lane_id + delta >= plane_dim).
plane_shuffle_up: Perform a shuffle up operation across the plane. Each unit reads the value from a unit with a lower lane ID (current_id - delta). Units with lane_id < delta will read from themselves (no change).
plane_shuffle_xor: Perform a shuffle XOR operation across the plane. Each unit exchanges its value with another unit at an index determined by XOR with the mask. This is useful for butterfly reduction patterns.
plane_sum: Perform a reduce sum operation across all units in a plane.
printf_expand: Prints a formatted message using the print debug layer in Vulkan, or printf in CUDA.
push_validation_error: Push a validation error that will make the kernel compilation to fail.
rhypot: Computes the reciprocal of the hypotenuse of a right triangle given the lengths of the other two sides.
select: Executes both branches, then selects a value based on the condition. This should be branchless, but might depend on the compiler.
select_many: Same as select() but with vectors instead.
set_polyfill: Change the meaning of the given cube primitive type during compilation.
spanned_expand: Calls an intrinsic op and inserts debug symbols if debug is enabled.
storage_type_of
tma_group_commit: Commit an async tensor operation. Not sure how this works, poor docs. But you need to call it after a write, but not after reads.
tma_group_wait: Wait until at most max_pending TMA copy operations are in flight.
tma_group_wait_read: Wait TMA copy operations have finished reading from shared memory, with at most max_pending operations being unfinished.
tma_store_1d: Copy a tile from a shared memory src to a global memory dst, with the provided offsets. Should be combined with memcpy_async_tensor_commit and memcpy_async_tensor_wait_read.
tma_store_2d: Copy a tile from a shared memory src to a global memory dst, with the provided offsets. Should be combined with memcpy_async_tensor_commit and memcpy_async_tensor_wait_read.
tma_store_3d: Copy a tile from a shared memory src to a global memory dst, with the provided offsets. Should be combined with memcpy_async_tensor_commit and memcpy_async_tensor_wait_read.
tma_store_4d: Copy a tile from a shared memory src to a global memory dst, with the provided offsets. Should be combined with memcpy_async_tensor_commit and memcpy_async_tensor_wait_read.
tma_store_5d: Copy a tile from a shared memory src to a global memory dst, with the provided offsets. Should be combined with memcpy_async_tensor_commit and memcpy_async_tensor_wait_read.
type_of
unary_expand
unary_expand_fixed_output

Type Aliases§

SharedExpand
SharedMemoryExpand
SliceMut

Module frontend

Module frontend Copy item path

Re-exports§

Modules§

Macros§

Structs§

Enums§

Constants§

Traits§

Functions§

Type Aliases§

Module frontend