Expand description
Cube Frontend Types.
Re-exports§
pub use branch::*;pub use synchronization::*;
Modules§
- ABSOLUTE_
POS - The position of the working unit in the whole cube kernel, without regards to cubes and axis.
- ABSOLUTE_
POS_ X - The index of the working unit in the whole cube kernel along the X axis, without regards to cubes.
- ABSOLUTE_
POS_ Y - The index of the working unit in the whole cube kernel along the Y axis, without regards to cubes.
- ABSOLUTE_
POS_ Z - The index of the working unit in the whole cube kernel along the Z axis, without regards to cubes.
- CUBE_
CLUSTER_ DIM - The total amount of cubes in a cluster.
- CUBE_
CLUSTER_ DIM_ X - The dimension of the cluster along the X axis.
- CUBE_
CLUSTER_ DIM_ Y - The dimension of the cluster along the Y axis.
- CUBE_
CLUSTER_ DIM_ Z - The dimension of the cluster along the Z axis.
- CUBE_
COUNT - The number of cubes launched.
- CUBE_
COUNT_ X - The number of cubes launched along the X axis.
- CUBE_
COUNT_ Y - The number of cubes launched along the Y axis.
- CUBE_
COUNT_ Z - The number of cubes launched along the Z axis.
- CUBE_
DIM - The total amount of working units in a cube.
- CUBE_
DIM_ X - The dimension of the cube along the X axis.
- CUBE_
DIM_ Y - The dimension of the cube along the Y axis.
- CUBE_
DIM_ Z - The dimension of the cube along the Z axis.
- CUBE_
POS - The cube position, without regards to axis.
- CUBE_
POS_ CLUSTER - The cube position within the cluster.
- CUBE_
POS_ CLUSTER_ X - The cube position in the cluster along the X axis.
- CUBE_
POS_ CLUSTER_ Y - The cube position in the cluster along the Y axis.
- CUBE_
POS_ CLUSTER_ Z - The cube position in the cluster along the Z axis.
- CUBE_
POS_ X - The cube position along the X axis.
- CUBE_
POS_ Y - The cube position along the Y axis.
- CUBE_
POS_ Z - The cube position along the Z axis.
- PLANE_
DIM - The total amount of working units in a plane.
- PLANE_
POS - The position of the plane within the cube (plane/warp/subgroup index).
- UNIT_
POS - The position of the working unit inside the cube, without regards to axis.
- UNIT_
POS_ PLANE - The relative position of the working unit inside the plane, without regards to cube dimensions.
- UNIT_
POS_ X - The position of the working unit inside the cube along the X axis.
- UNIT_
POS_ Y - The position of the working unit inside the cube along the Y axis.
- UNIT_
POS_ Z - The position of the working unit inside the cube along the Z axis.
- add
- add_
assign - add_
assign_ array_ op - add_
assign_ op - and
- assign
- barrier
- This module exposes barrier for asynchronous data transfer
- bitand
- bitand_
assign_ array_ op - bitand_
assign_ op - bitor
- bitor_
assign_ array_ op - bitor_
assign_ op - bitxor
- bitxor_
assign_ array_ op - bitxor_
assign_ op - branch
- cast
- clamp
- clamp_
max - clamp_
min - cmma
- This module exposes cooperative matrix-multiply and accumulate operations.
- comptime
- Module containing compile-time information about the current runtime.
- comptime_
error - copy_
bulk - cube_
comment - div
- div_
assign_ array_ op - div_
assign_ op - div_
ceil - eq
- erf
- fma
- Expand method of
fma(). - ge
- gt
- hypot
- index
- index_
assign - index_
unchecked - le
- lt
- max
- min
- mul
- mul_
assign_ array_ op - mul_
assign_ op - ne
- neg
- not
- or
- plane_
all - Module containing the expand function for
plane_all(). - plane_
any - Module containing the expand function for
plane_any(). - plane_
ballot - Module containing the expand function for
plane_ballot(). - plane_
broadcast - Module containing the expand function for
plane_broadcast(). - plane_
elect - Module containing the expand function for
plane_elect(). - plane_
exclusive_ prod - Module containing the expand function for
plane_exclusive_prod(). - plane_
exclusive_ sum - Module containing the expand function for
plane_exclusive_sum(). - plane_
inclusive_ prod - Module containing the expand function for
plane_inclusive_prod(). - plane_
inclusive_ sum - Module containing the expand function for
plane_inclusive_sum(). - plane_
max - Module containing the expand function for
plane_max(). - plane_
min - Module containing the expand function for
plane_min(). - plane_
prod - Module containing the expand function for
plane_prod(). - plane_
shuffle - Module containing the expand function for
plane_shuffle(). - plane_
shuffle_ down - Module containing the expand function for
plane_shuffle_down(). - plane_
shuffle_ up - Module containing the expand function for
plane_shuffle_up(). - plane_
shuffle_ xor - Module containing the expand function for
plane_shuffle_xor(). - plane_
sum - Module containing the expand function for
plane_sum(). - push_
validation_ error - rem
- rem_
assign_ array_ op - rem_
assign_ op - rhypot
- select
- select_
many - set_
polyfill - Expand module of
set_polyfill(). - shl
- shl_
assign_ array_ op - shl_
assign_ op - shr
- shr_
assign_ array_ op - shr_
assign_ op - storage_
type_ of - sub
- sub_
assign_ array_ op - sub_
assign_ op - synchronization
- tma_
group_ commit - tma_
group_ wait - tma_
group_ wait_ read - tma_
store_ 1d - tma_
store_ 2d - tma_
store_ 3d - tma_
store_ 4d - tma_
store_ 5d - type_of
Macros§
- debug_
print - Print a formatted message using the target’s debug print facilities. The format string is target specific, but Vulkan and CUDA both use the C++ conventions. WGSL isn’t currently supported.
- debug_
print_ expand - Print a formatted message using the target’s debug print facilities. The format string is target specific, but Vulkan and CUDA both use the C++ conventions. WGSL isn’t currently supported.
- define_
scalar - Define a custom type to be used for a comptime scalar type. Useful for cases where generics can’t work.
- define_
size - Define a custom type to be used for a comptime size. Useful for cases where generics can’t work.
Structs§
- Array
- A contiguous array of elements.
- Array
Binding - Tensor representation with a reference to the server handle.
- Array
Compilation Arg - Atomic
- An atomic numerical type wrapping a normal numeric primitive. Enables the use of atomic operations, while disabling normal operations. In WGSL, this is a separate type - on CUDA/SPIR-V it can theoretically be bitcast to a normal number, but this isn’t recommended.
- Comptime
Cell - A cell that can store and mutate a cube type during comptime.
- Comptime
Cell Expand - Expand type of
ComptimeCell. - Const
- Dynamic
Scalar - A fake element type that can be configured to map to any other element type.
- Dynamic
Size - A fake constant type that can be configured to map to any comptime value.
- Im2col
- Im2col indexing. Loads a “column” (not the same column as im2col) of pixels into shared
memory, with a certain offset (kernel position). The corners are the bounds to load pixels
from at offset 0, so the top left corner of the kernel. The offset is added to the
corner offsets, so a
(-1, -1)corner will stop the bounding box at(1, 1)for kernel offset(2, 2). - Im2col
Args - Args for im2col tensor maps
- Im2col
Compilation Arg - Im2col
Expand - Im2col
Launch - Im2col
Wide - 1D im2col, not properly supported yet
- Im2col
Wide Args - Args for im2col wide tensor maps
- Im2col
Wide Compilation Arg - Im2col
Wide Expand - Im2col
Wide Launch - Input
Scalar - A way to define an input scalar without a generic attached to it.
- Input
Scalar Compilation Arg - Input
Scalar Expand - Native
Expand - Expand type of a native GPU type, i.e. scalar primitives, arrays, shared memory.
- Option
Expand - Ordering
Expand - Read
Only - Read
Write - Registry
- It is similar to a map, but where the keys are stored at comptime, but the values can be runtime variables.
- Runtime
Cell - Runtime
Cell Expand - Sequence
- A sequence of cube types that is inlined during compilation.
- Sequence
Arg - Sequence
Compilation Arg - Sequence
Expand - Expand type of Sequence.
- Shared
- Shared
Memory - Slice
- A read-only contiguous list of elements
- Slice
Expand - Tensor
- The tensor type is similar to the array type, however it comes with more metadata such as stride and shape.
- Tensor
Binding - Tensor representation with a reference to the server handle, the strides and the shape.
- Tensor
Compilation Arg - Compilation argument for a tensor.
- Tensor
Map - A CUDA
CUtensorMapobject. Represents a tensor encoded with a lot of metadata, and is an opaque packed object at runtime. Does not support retrieving any shapes or strides, nor does it give access to the pointer. So these need to be passed separately in an aliasedTensorif needed. - Tensor
MapArg - Grid constant tensor map, currently only maps to CUDA tensormap. May be interleaved or swizzled, but last dimension must be contiguous (since strides don’t include the last dimension).
- Tiled
- Regular tiled tensor map
- Tiled
Args - Args for tiled tensor maps
- Tiled
Compilation Arg - Tiled
Expand - Tiled
Launch - Vector
- A contiguous list of elements that supports auto-vectorized operations.
Enums§
- Array
Arg - Comptime
Option - Comptime
Option Args - Comptime
Option Compilation Arg - Comptime
Option Expand - OobFill
- What value to use when filling out of bounds values
- Option
Args - Option
Compilation Arg - Slice
Origin - Slice
Origin Expand - Tensor
Arg - Argument to be used for tensors passed as arguments to kernels.
- Tensor
MapFormat - Format of
TensorMap - Tensor
MapInterleave - Interleave setting for
TensorMap - Tensor
MapPrefetch - Additional prefetching to perform during load Specifies L2 fetch size which indicates the byte granularity at which L2 requests are filled from DRAM
- Tensor
MapSwizzle - Data are organized in a specific order in global memory; however, this may not match the order
in which the application accesses data in shared memory. This difference in data organization
may cause bank conflicts when shared memory is accessed. In order to avoid this problem, data
can be loaded to shared memory with shuffling across shared memory banks. When interleave is
TensorMapInterleave::B32, swizzle must beTensorMapSwizzle::B32. Other interleave modes can have any swizzling pattern.
Constants§
- ABSOLUTE_
POS - The position of the working unit in the whole cube kernel, without regards to cubes and axis.
- ABSOLUTE_
POS_ X - The index of the working unit in the whole cube kernel along the X axis, without regards to cubes.
- ABSOLUTE_
POS_ Y - The index of the working unit in the whole cube kernel along the Y axis, without regards to cubes.
- ABSOLUTE_
POS_ Z - The index of the working unit in the whole cube kernel along the Z axis, without regards to cubes.
- CUBE_
CLUSTER_ DIM - The total amount of cubes in a cluster.
- CUBE_
CLUSTER_ DIM_ X - The dimension of the cluster along the X axis.
- CUBE_
CLUSTER_ DIM_ Y - The dimension of the cluster along the Y axis.
- CUBE_
CLUSTER_ DIM_ Z - The dimension of the cluster along the Z axis.
- CUBE_
COUNT - The number of cubes launched.
- CUBE_
COUNT_ X - The number of cubes launched along the X axis.
- CUBE_
COUNT_ Y - The number of cubes launched along the Y axis.
- CUBE_
COUNT_ Z - The number of cubes launched along the Z axis.
- CUBE_
DIM - The total amount of working units in a cube.
- CUBE_
DIM_ X - The dimension of the cube along the X axis.
- CUBE_
DIM_ Y - The dimension of the cube along the Y axis.
- CUBE_
DIM_ Z - The dimension of the cube along the Z axis.
- CUBE_
POS - The cube position, without regards to axis.
- CUBE_
POS_ CLUSTER - The cube position within the cluster.
- CUBE_
POS_ CLUSTER_ X - The cube position in the cluster along the X axis.
- CUBE_
POS_ CLUSTER_ Y - The cube position in the cluster along the Y axis.
- CUBE_
POS_ CLUSTER_ Z - The cube position in the cluster along the Z axis.
- CUBE_
POS_ X - The cube position along the X axis.
- CUBE_
POS_ Y - The cube position along the Y axis.
- CUBE_
POS_ Z - The cube position along the Z axis.
- PLANE_
DIM - The total amount of working units in a plane.
- PLANE_
POS - The position of the plane within the cube (plane/warp/subgroup index).
- UNIT_
POS - The position of the working unit inside the cube, without regards to axis.
- UNIT_
POS_ PLANE - The relative position of the working unit inside the plane, without regards to cube dimensions.
- UNIT_
POS_ X - The position of the working unit inside the cube along the X axis.
- UNIT_
POS_ Y - The position of the working unit inside the cube along the Y axis.
- UNIT_
POS_ Z - The position of the working unit inside the cube along the Z axis.
Traits§
- Abs
- AbsExpand
- AddAssign
Expand - AddExpand
- ArcCos
- ArcCos
Expand - ArcCosh
- ArcCosh
Expand - ArcSin
- ArcSin
Expand - ArcSinh
- ArcSinh
Expand - ArcTan
- ArcTan2
- ArcTan2
Expand - ArcTan
Expand - ArcTanh
- ArcTanh
Expand - AsMut
Expand - AsRef
Expand - Assign
- BoolOps
- Extension trait for bool.
- Cast
- Enable elegant casting from any to any
CubeElem - Ceil
- Ceil
Expand - Clone
Expand - Compilation
Arg - Argument used during the compilation of kernels.
- Comptime
Index - Workaround for comptime indexing, since the helper that replaces index operators doesn’t know about whether a variable is comptime. Has the same signature in unexpanded code, so it will automatically dispatch the correct one.
- Comptime
Index Mut - Cos
- CosExpand
- Cosh
- Cosh
Expand - Count
Ones - Count
Ones Expand - CubeAdd
- Cube
AddAssign - Cube
Comptime - A type that can be used as a kernel comptime argument.
Note that a type doesn’t need to implement
CubeComptimeto be used as a comptime argument. However, this facilitate the declaration of generic cube types. - Cube
Debug - CubeDiv
- Cube
DivAssign - Cube
Enum - Cube
Index - Fake indexation so we can rewrite indexes into scalars as calls to this fake function in the non-expanded function
- Cube
Index Expand - Cube
Index Mut - Cube
Index MutExpand - CubeMul
- Cube
MulAssign - CubeNot
- Cube
Option - Extensions for
Option - Cube
Option Default - Extensions for
Optionthat require default - CubeOrd
- Cube
Ordering - Cube
Primitive - Form of
CubeTypethat encapsulates all primitive types: Numeric,UInt, Bool - Cube
Primitive Expand - CubeRem
- Cube
RemAssign - CubeSub
- Cube
SubAssign - Cube
Type - Types used in a cube function must implement this trait
- Default
Expand - Degrees
- Degrees
Expand - DivAssign
Expand - DivCeil
- DivCeil
Expand - DivExpand
- Dot
- DotExpand
- Erf
- ErfExpand
- Exp
- ExpExpand
- Find
First Set - Find
First SetExpand - Float
- Floating point numbers. Used as input in float kernels
- Float
Bits - Float
Bits Expand - Float
Ops - Float
OpsExpand - Floor
- Floor
Expand - Hypot
- Hypot
Expand - Int
- Signed or unsigned integer. Used as input in int kernels
- Into
Comptime - Trait for marking a function return value as comptime when the compiler can’t infer it.
- IntoMut
- Convert an expand type to a version with mutable registers when necessary.
- Into
Runtime - Trait useful to convert a comptime value into runtime value.
- Inverse
Sqrt - Inverse
Sqrt Expand - IsInf
- IsInf
Expand - IsNan
- IsNan
Expand - Launch
Arg - Defines how a launch argument can be expanded.
- Leading
Zeros - Leading
Zeros Expand - List
- Type from which we can read values in cube functions.
For a mutable version, see
ListMut. - List
Expand - Type from which we can read values in cube functions.
For a mutable version, see
ListMut. - ListMut
- Type for which we can read and write values in cube functions. For an immutable version, see List.
- List
MutExpand - Type for which we can read and write values in cube functions. For an immutable version, see List.
- Log
- Log1p
- Log1p
Expand - LogExpand
- Magnitude
- Magnitude
Expand - MulAssign
Expand - MulExpand
- MulHi
- MulHi
Expand - Native
Assign - Trait for native types that can be assigned. For non-native composites, use the normal
Assign. - Normalize
- Normalize
Expand - NotExpand
- Numeric
- Type that encompasses both (unsigned or signed) integers and floats Used in kernels that should work for both.
- OneExpand
- Option
Ext - OrdExpand
- Powf
- Powf
Expand - Powi
- Powi
Expand - Radians
- Radians
Expand - Recip
- Recip
Expand - Registry
Query - To find an item from the registry, the query must be able to be translated to the actual key type.
- Reinterpret
- Enables reinterpetring the bits from any value to any other type of the same size.
- RemAssign
Expand - RemExpand
- Remainder
- Remainder
Expand - Reverse
Bits - Reverse
Bits Expand - Rhypot
- Rhypot
Expand - Round
- Round
Expand - Saturating
Add - Saturating
AddExpand - Saturating
Sub - Saturating
SubExpand - Scalar
- Marker trait for scalar primitives. Should be implemented for all scalar
CubePrimitives, but not forVectoror non-standard primitives likeBarrier. Alternatively, treat these as types that can be stored in a [Vector] - Scalar
ArgSettings - Similar to [
ArgSettings], however only for scalar types that don’t depend on the Runtime trait. - Sin
- SinExpand
- Sinh
- Sinh
Expand - Size
- Sized
Container - Slice
MutOperator - Slice
MutOperator Expand - Slice
Operator - Slice
Operator Expand - Slice
Visibility - Sqrt
- Sqrt
Expand - SubAssign
Expand - SubExpand
- Tan
- TanExpand
- Tanh
- Tanh
Expand - Tensor
MapKind - Trailing
Zeros - Trailing
Zeros Expand - Trunc
- Trunc
Expand - Vector
Sum - Vector
SumExpand - Vectorized
- Vectorized
Expand - Zero
Expand
Functions§
- array_
assign_ binary_ op_ expand - copy_
bulk - Bulk copy
lengthelements between two array-likes without intermediates. - debug_
call_ expand - Calls a function and inserts debug symbols if debug is enabled.
- debug_
source_ expand - Adds source instruction if debug is enabled
- debug_
var_ expand - Registers name for an expand if possible
- div_
ceil - erf
- expand_
erf - expand_
himul_ 64 - expand_
himul_ sim - expand_
hypot - expand_
rhypot - fast_
math_ expand - fma
- Fused multiply-add
A*B+C. - hypot
- Computes the hypotenuse of a right triangle given the lengths of the other two sides.
- init_
expand - into_
mut_ assign - max
- The maximum of two values, not requiring
Ord. Provided for clarity in certain cases, thoughclamp_minmay sometimes be more clear. - min
- The minimum of two values, not requiring
Ord. Provided for clarity in certain cases, thoughclamp_maxmay sometimes be more clear. - plane_
all - Perform a reduce all operation across all units in a plane.
- plane_
any - Perform a reduce any operation across all units in a plane.
- plane_
ballot - Perform a ballot operation across all units in a plane.
Returns a set of 32-bit bitfields as a
Vector, with each element containing the value from 32 invocations. Note that vector size will always be set to 4 even forPLANE_DIM <= 64, because we can’t retrieve the actual plane size at expand time. Use the runtimePLANE_DIMto index appropriately. - plane_
broadcast - Broadcasts the value from the specified plane unit at the given index
to all active units within that plane. Requires a constant index. For non-constant indices,
use
plane_shuffle(). - plane_
elect - Returns true if the cube unit has the lowest
plane_unit_idamong active unit in the plane - plane_
exclusive_ prod - Perform an exclusive product operation across all units in a plane.
This multiplies all values to the “left” of the unit, excluding this unit’s value. The 0th unit
will be set to
E::one(). Also known as “exclusive prefix product” or “exclusive scan”. - plane_
exclusive_ sum - Perform an exclusive sum operation across all units in a plane.
This sums all values to the “left” of the unit, excluding this unit’s value. The 0th unit will
be set to
E::zero(). Also known as “exclusive prefix sum” or “exclusive scan”. - plane_
inclusive_ prod - Perform an inclusive product operation across all units in a plane. This multiplies all values to the “left” of the unit, including this unit’s value. Also known as “prefix product” or “inclusive scan”.
- plane_
inclusive_ sum - Perform an inclusive sum operation across all units in a plane. This sums all values to the “left” of the unit, including this unit’s value. Also known as “prefix sum” or “inclusive scan”.
- plane_
max - Perform a reduce max operation across all units in a plane.
- plane_
min - Perform a reduce min operation across all units in a plane.
- plane_
prod - Perform a reduce prod operation across all units in a plane.
- plane_
shuffle - Perform an arbitrary lane shuffle operation across the plane. Each unit reads the value from the specified source lane.
- plane_
shuffle_ down - Perform a shuffle down operation across the plane.
Each unit reads the value from a unit with a higher lane ID (
current_id+ delta). Units at the end will read from themselves if (lane_id+ delta >=plane_dim). - plane_
shuffle_ up - Perform a shuffle up operation across the plane.
Each unit reads the value from a unit with a lower lane ID (
current_id- delta). Units withlane_id< delta will read from themselves (no change). - plane_
shuffle_ xor - Perform a shuffle XOR operation across the plane. Each unit exchanges its value with another unit at an index determined by XOR with the mask. This is useful for butterfly reduction patterns.
- plane_
sum - Perform a reduce sum operation across all units in a plane.
- printf_
expand - Prints a formatted message using the print debug layer in Vulkan, or
printfin CUDA. - push_
validation_ error - Push a validation error that will make the kernel compilation to fail.
- rhypot
- Computes the reciprocal of the hypotenuse of a right triangle given the lengths of the other two sides.
- select
- Executes both branches, then selects a value based on the condition. This should be branchless, but might depend on the compiler.
- select_
many - Same as
select()but with vectors instead. - set_
polyfill - Change the meaning of the given cube primitive type during compilation.
- spanned_
expand - Calls an intrinsic op and inserts debug symbols if debug is enabled.
- storage_
type_ of - tma_
group_ commit - Commit an async tensor operation. Not sure how this works, poor docs. But you need to call it after a write, but not after reads.
- tma_
group_ wait - Wait until at most
max_pendingTMA copy operations are in flight. - tma_
group_ wait_ read - Wait TMA copy operations have finished reading from shared memory, with at most
max_pendingoperations being unfinished. - tma_
store_ 1d - Copy a tile from a shared memory
srcto a global memorydst, with the provided offsets. Should be combined withmemcpy_async_tensor_commitandmemcpy_async_tensor_wait_read. - tma_
store_ 2d - Copy a tile from a shared memory
srcto a global memorydst, with the provided offsets. Should be combined withmemcpy_async_tensor_commitandmemcpy_async_tensor_wait_read. - tma_
store_ 3d - Copy a tile from a shared memory
srcto a global memorydst, with the provided offsets. Should be combined withmemcpy_async_tensor_commitandmemcpy_async_tensor_wait_read. - tma_
store_ 4d - Copy a tile from a shared memory
srcto a global memorydst, with the provided offsets. Should be combined withmemcpy_async_tensor_commitandmemcpy_async_tensor_wait_read. - tma_
store_ 5d - Copy a tile from a shared memory
srcto a global memorydst, with the provided offsets. Should be combined withmemcpy_async_tensor_commitandmemcpy_async_tensor_wait_read. - type_of
- unary_
expand - unary_
expand_ fixed_ output