Expand description
Cube Frontend Types.
Re-exports§
- pub use branch::RangeExpand;
- pub use branch::SteppedRangeExpand;
- pub use branch::range;
- pub use branch::range;
- pub use branch::range_stepped;
- pub use branch::range_stepped;
Modules§
- ABSOLUTE_POS 
- The position of the working unit in the whole cube kernel, without regards to cubes and axis.
- ABSOLUTE_POS_ X 
- The index of the working unit in the whole cube kernel along the X axis, without regards to cubes.
- ABSOLUTE_POS_ Y 
- The index of the working unit in the whole cube kernel along the Y axis, without regards to cubes.
- ABSOLUTE_POS_ Z 
- The index of the working unit in the whole cube kernel along the Z axis, without regards to cubes.
- CUBE_CLUSTER_ DIM 
- The total amount of cubes in a cluster.
- CUBE_CLUSTER_ DIM_ X 
- The dimension of the cluster along the X axis.
- CUBE_CLUSTER_ DIM_ Y 
- The dimension of the cluster along the Y axis.
- CUBE_CLUSTER_ DIM_ Z 
- The dimension of the cluster along the Z axis.
- CUBE_COUNT 
- The number of cubes launched.
- CUBE_COUNT_ X 
- The number of cubes launched along the X axis.
- CUBE_COUNT_ Y 
- The number of cubes launched along the Y axis.
- CUBE_COUNT_ Z 
- The number of cubes launched along the Z axis.
- CUBE_DIM 
- The total amount of working units in a cube.
- CUBE_DIM_ X 
- The dimension of the cube along the X axis.
- CUBE_DIM_ Y 
- The dimension of the cube along the Y axis.
- CUBE_DIM_ Z 
- The dimension of the cube along the Z axis.
- CUBE_POS 
- The cube position, without regards to axis.
- CUBE_POS_ CLUSTER 
- The cube position within the cluster.
- CUBE_POS_ CLUSTER_ X 
- The cube position in the cluster along the X axis.
- CUBE_POS_ CLUSTER_ Y 
- The cube position in the cluster along the Y axis.
- CUBE_POS_ CLUSTER_ Z 
- The cube position in the cluster along the Z axis.
- CUBE_POS_ X 
- The cube position along the X axis.
- CUBE_POS_ Y 
- The cube position along the Y axis.
- CUBE_POS_ Z 
- The cube position along the Z axis.
- PLANE_DIM 
- The total amount of working units in a plane.
- UNIT_POS 
- The position of the working unit inside the cube, without regards to axis.
- UNIT_POS_ PLANE 
- The relative position of the working unit inside the plane, without regards to cube dimensions.
- UNIT_POS_ X 
- The position of the working unit inside the cube along the X axis.
- UNIT_POS_ Y 
- The position of the working unit inside the cube along the Y axis.
- UNIT_POS_ Z 
- The position of the working unit inside the cube along the Z axis.
- add
- add_assign 
- add_assign_ array_ op 
- add_assign_ op 
- and
- assign
- barrier
- This module exposes barrier for asynchronous data transfer
- bitand
- bitand_assign_ array_ op 
- bitand_assign_ op 
- bitor
- bitor_assign_ array_ op 
- bitor_assign_ op 
- bitxor
- bitxor_assign_ array_ op 
- bitxor_assign_ op 
- branch
- cast
- cmma
- This module exposes cooperative matrix-multiply and accumulate operations.
- comptime_error 
- copy_bulk 
- cube_comment 
- div
- div_assign_ array_ op 
- div_assign_ op 
- div_ceil 
- eq
- erf
- ge
- gt
- index
- index_assign 
- index_unchecked 
- le
- lt
- mul
- mul_assign_ array_ op 
- mul_assign_ op 
- ne
- neg
- not
- or
- plane_all 
- Module containing the expand function for plane_all().
- plane_any 
- Module containing the expand function for plane_any().
- plane_ballot 
- Module containing the expand function for plane_ballot().
- plane_broadcast 
- Module containing the expand function for plane_broadcast().
- plane_elect 
- Module containing the expand function for plane_elect().
- plane_exclusive_ prod 
- Module containing the expand function for plane_exclusive_prod().
- plane_exclusive_ sum 
- Module containing the expand function for plane_exclusive_sum().
- plane_inclusive_ prod 
- Module containing the expand function for plane_inclusive_prod().
- plane_inclusive_ sum 
- Module containing the expand function for plane_inclusive_sum().
- plane_max 
- Module containing the expand function for plane_max().
- plane_min 
- Module containing the expand function for plane_min().
- plane_prod 
- Module containing the expand function for plane_prod().
- plane_shuffle 
- Module containing the expand function for plane_shuffle().
- plane_shuffle_ down 
- Module containing the expand function for plane_shuffle_down().
- plane_shuffle_ up 
- Module containing the expand function for plane_shuffle_up().
- plane_shuffle_ xor 
- Module containing the expand function for plane_shuffle_xor().
- plane_sum 
- Module containing the expand function for plane_sum().
- rem
- rem_assign_ array_ op 
- rem_assign_ op 
- select
- select_many 
- set_polyfill 
- Expand module of set_polyfill().
- shl
- shl_assign_ array_ op 
- shl_assign_ op 
- shr
- shr_assign_ array_ op 
- shr_assign_ op 
- sub
- sub_assign_ array_ op 
- sub_assign_ op 
- synchronization
- tma_group_ commit 
- tma_group_ wait 
- tma_group_ wait_ read 
- tma_store_ 1d 
- tma_store_ 2d 
- tma_store_ 3d 
- tma_store_ 4d 
- tma_store_ 5d 
Macros§
- debug_print 
- Print a formatted message using the target’s debug print facilities. The format string is target specific, but Vulkan and CUDA both use the C++ conventions. WGSL isn’t currently supported.
- debug_print_ expand 
- Print a formatted message using the target’s debug print facilities. The format string is target specific, but Vulkan and CUDA both use the C++ conventions. WGSL isn’t currently supported.
Structs§
- Array
- A contiguous array of elements.
- ArrayCompilation Arg 
- ArrayHandle Ref 
- Tensor representation with a reference to the server handle.
- Atomic
- An atomic numerical type wrapping a normal numeric primitive. Enables the use of atomic operations, while disabling normal operations. In WGSL, this is a separate type - on CUDA/SPIR-V it can theoretically be bitcast to a normal number, but this isn’t recommended.
- ComptimeCell 
- A cell that can store and mutate a cube type during comptime.
- ComptimeCell Expand 
- Expand type of ComptimeCell.
- ElemExpand 
- A fake element type that can be configured to map to any other element type.
- ExpandElement Typed 
- Expand type associated with a type.
- IntExpand
- Line
- A contiguous list of elements that supports auto-vectorized operations.
- ReadOnly 
- ReadWrite 
- Registry
- It is similar to a map, but where the keys are stored at comptime, but the values can be runtime variables.
- RuntimeCell 
- RuntimeCell Expand 
- ScalarArg 
- ScalarCompilation Arg 
- Sequence
- A sequence of cube types that is inlined during compilation.
- SequenceArg 
- SequenceCompilation Arg 
- SequenceExpand 
- Expand type of Sequence.
- SharedMemory 
- Slice
- A read-only contiguous list of elements
- SliceExpand 
- Tensor
- The tensor type is similar to the array type, however it comes with more metadata such as stride and shape.
- TensorCompilation Arg 
- Compilation argument for a tensor.
- TensorHandle Ref 
- Tensor representation with a reference to the server handle, the strides and the shape.
- TensorMap 
- A CUDA CUtensorMapobject. Represents a tensor encoded with a lot of metadata, and is an opaque packed object at runtime. Does not support retrieving any shapes or strides, nor does it give access to the pointer. So these need to be passed separately in an aliasedTensorif needed.
- TensorMapArg 
- Grid constant tensor map, currently only maps to CUDA tensormap. May be interleaved or swizzled, but last dimension must be contiguous (since strides don’t include the last dimension).
- TensorMapCompilation Arg 
- Compilation argument for a tensor map.
Enums§
- ArrayArg 
- FastMath 
- Unchecked optimizations for float operations. May cause precision differences, or undefined behaviour if the relevant conditions are not followed.
- OobFill
- What value to use when filling out of bounds values
- SliceOrigin 
- SliceOrigin Expand 
- TensorArg 
- Argument to be used for tensors passed as arguments to kernels.
- TensorMapFormat 
- Format of [TensorMap]
- TensorMapInterleave 
- Interleave setting for [TensorMap]
- TensorMapPrefetch 
- Additional prefetching to perform during load Specifies L2 fetch size which indicates the byte granularity at which L2 requests are filled from DRAM
- TensorMapSwizzle 
- Data are organized in a specific order in global memory; however, this may not match the order
in which the application accesses data in shared memory. This difference in data organization
may cause bank conflicts when shared memory is accessed. In order to avoid this problem, data
can be loaded to shared memory with shuffling across shared memory banks. When interleave is
TensorMapInterleave::B32, swizzle must beTensorMapSwizzle::B32. Other interleave modes can have any swizzling pattern.
Constants§
- ABSOLUTE_POS 
- The position of the working unit in the whole cube kernel, without regards to cubes and axis.
- ABSOLUTE_POS_ X 
- The index of the working unit in the whole cube kernel along the X axis, without regards to cubes.
- ABSOLUTE_POS_ Y 
- The index of the working unit in the whole cube kernel along the Y axis, without regards to cubes.
- ABSOLUTE_POS_ Z 
- The index of the working unit in the whole cube kernel along the Z axis, without regards to cubes.
- CUBE_CLUSTER_ DIM 
- The total amount of cubes in a cluster.
- CUBE_CLUSTER_ DIM_ X 
- The dimension of the cluster along the X axis.
- CUBE_CLUSTER_ DIM_ Y 
- The dimension of the cluster along the Y axis.
- CUBE_CLUSTER_ DIM_ Z 
- The dimension of the cluster along the Z axis.
- CUBE_COUNT 
- The number of cubes launched.
- CUBE_COUNT_ X 
- The number of cubes launched along the X axis.
- CUBE_COUNT_ Y 
- The number of cubes launched along the Y axis.
- CUBE_COUNT_ Z 
- The number of cubes launched along the Z axis.
- CUBE_DIM 
- The total amount of working units in a cube.
- CUBE_DIM_ X 
- The dimension of the cube along the X axis.
- CUBE_DIM_ Y 
- The dimension of the cube along the Y axis.
- CUBE_DIM_ Z 
- The dimension of the cube along the Z axis.
- CUBE_POS 
- The cube position, without regards to axis.
- CUBE_POS_ CLUSTER 
- The cube position within the cluster.
- CUBE_POS_ CLUSTER_ X 
- The cube position in the cluster along the X axis.
- CUBE_POS_ CLUSTER_ Y 
- The cube position in the cluster along the Y axis.
- CUBE_POS_ CLUSTER_ Z 
- The cube position in the cluster along the Z axis.
- CUBE_POS_ X 
- The cube position along the X axis.
- CUBE_POS_ Y 
- The cube position along the Y axis.
- CUBE_POS_ Z 
- The cube position along the Z axis.
- PLANE_DIM 
- The total amount of working units in a plane.
- UNIT_POS 
- The position of the working unit inside the cube, without regards to axis.
- UNIT_POS_ PLANE 
- The relative position of the working unit inside the plane, without regards to cube dimensions.
- UNIT_POS_ X 
- The position of the working unit inside the cube along the X axis.
- UNIT_POS_ Y 
- The position of the working unit inside the cube along the Y axis.
- UNIT_POS_ Z 
- The position of the working unit inside the cube along the Z axis.
Traits§
- Abs
- ArgSettings
- Defines the argument settings used to launch a kernel.
- BitwiseNot 
- BoolOps
- Extension trait for bool.
- Cast
- Enable elegant casting from any to any CubeElem
- Ceil
- Clamp
- CompilationArg 
- Argument used during the compilation of kernels.
- Cos
- CountOnes 
- CubeComptime 
- A type that can be used as a kernel comptime argument.
Note that a type doesn’t need to implement CubeComptimeto be used as a comptime argument. However, this facilitate the declaration of generic cube types.
- CubeDebug 
- CubeIndex 
- Fake indexation so we can rewrite indexes into scalars as calls to this fake function in the non-expanded function
- CubeIndex Expand 
- CubeIndex Mut 
- CubeIndex MutExpand 
- CubePrimitive 
- Form of CubeType that encapsulates all primitive types: Numeric, UInt, Bool
- CubeType 
- Types used in a cube function must implement this trait
- Dot
- Erf
- Exp
- ExpandElement Into Mut 
- FindFirst Set 
- Float
- Floating point numbers. Used as input in float kernels
- Floor
- Index
- Int
- Signed or unsigned integer. Used as input in int kernels
- IntoMut
- Convert an expand type to a version with mutable registers when necessary.
- IntoRuntime 
- Trait useful to convert a comptime value into runtime value.
- IsInf
- IsNan
- LaunchArg 
- Defines how a launch argument can be expanded.
- LeadingZeros 
- Lined
- LinedExpand 
- List
- Type from which we can read values in cube functions. For a mutable version, see ListMut.
- ListExpand 
- Type from which we can read values in cube functions. For a mutable version, see ListMut.
- ListMut
- Type for which we can read and write values in cube functions. For an immutable version, see List.
- ListMutExpand 
- Type for which we can read and write values in cube functions. For an immutable version, see List.
- Log
- Log1p
- Magnitude
- Max
- Min
- MulHi
- Normalize
- Numeric
- Type that encompasses both (unsigned or signed) integers and floats Used in kernels that should work for both.
- OptionExt 
- Powf
- Powi
- Recip
- RegistryQuery 
- To find an item from the registry, the query must be able to be translated to the actual key type.
- Reinterpret
- Enables reinterpetring the bits from any value to any other type of the same size.
- Remainder
- ReverseBits 
- Round
- SaturatingAdd 
- SaturatingSub 
- ScalarArgSettings 
- Similar to ArgSettings, however only for scalar types that don’t depend on the Runtime trait.
- Sin
- SizedContainer 
- SliceMutOperator 
- SliceMutOperator Expand 
- SliceOperator 
- SliceOperator Expand 
- SliceVisibility 
- Sqrt
- Tanh
- Trunc
Functions§
- array_assign_ binary_ op_ expand 
- copy_bulk 
- Bulk copy lengthelements between two array-likes without intermediates.
- debug_call_ expand 
- Calls a function and inserts debug symbols if debug is enabled.
- debug_source_ expand 
- Adds source instruction if debug is enabled
- debug_var_ expand 
- Registers name for an expand if possible
- div_ceil 
- erf
- expand_checked_ index_ assign 
- expand_erf 
- expand_himul_ 64 
- expand_himul_ sim 
- fma
- Fused multiply-add A*B+C.
- fma_expand 
- Expand method of fma.
- init_expand 
- plane_all 
- Perform a reduce all operation across all units in a plane.
- plane_any 
- Perform a reduce any operation across all units in a plane.
- plane_ballot 
- Perform a ballot operation across all units in a plane.
Returns a set of 32-bit bitfields as a Line, with each element containing the value from 32 invocations. Note that line size will always be set to 4 even forPLANE_DIM <= 64, because we can’t retrieve the actual plane size at expand time. Use the runtimePLANE_DIMto index appropriately.
- plane_broadcast 
- Broadcasts the value from the specified plane unit at the given index to all active units within that plane.
- plane_elect 
- Returns true if the cube unit has the lowest plane_unit_id among active unit in the plane
- plane_exclusive_ prod 
- Perform an exclusive product operation across all units in a plane.
This multiplies all values to the “left” of the unit, excluding this unit’s value. The 0th unit
will be set to E::one(). Also known as “exclusive prefix product” or “exclusive scan”.
- plane_exclusive_ sum 
- Perform an exclusive sum operation across all units in a plane.
This sums all values to the “left” of the unit, excluding this unit’s value. The 0th unit will
be set to E::zero(). Also known as “exclusive prefix sum” or “exclusive scan”.
- plane_inclusive_ prod 
- Perform an inclusive product operation across all units in a plane. This multiplies all values to the “left” of the unit, including this unit’s value. Also known as “prefix product” or “inclusive scan”.
- plane_inclusive_ sum 
- Perform an inclusive sum operation across all units in a plane. This sums all values to the “left” of the unit, including this unit’s value. Also known as “prefix sum” or “inclusive scan”.
- plane_max 
- Perform a reduce max operation across all units in a plane.
- plane_min 
- Perform a reduce min operation across all units in a plane.
- plane_prod 
- Perform a reduce prod operation across all units in a plane.
- plane_shuffle 
- Perform an arbitrary lane shuffle operation across the plane. Each unit reads the value from the specified source lane.
- plane_shuffle_ down 
- Perform a shuffle down operation across the plane. Each unit reads the value from a unit with a higher lane ID (current_id + delta). Units at the end will read from themselves if (lane_id + delta >= plane_dim).
- plane_shuffle_ up 
- Perform a shuffle up operation across the plane. Each unit reads the value from a unit with a lower lane ID (current_id - delta). Units with lane_id < delta will read from themselves (no change).
- plane_shuffle_ xor 
- Perform a shuffle XOR operation across the plane. Each unit exchanges its value with another unit at an index determined by XOR with the mask. This is useful for butterfly reduction patterns.
- plane_sum 
- Perform a reduce sum operation across all units in a plane.
- printf_expand 
- Prints a formatted message using the print debug layer in Vulkan, or printfin CUDA.
- select
- Executes both branches, then selects a value based on the condition. This should be branchless, but might depend on the compiler.
- select_many 
- Same as select() but with lines instead.
- set_polyfill 
- Change the meaning of the given cube primitive type during compilation.
- spanned_expand 
- Calls an intrinsic op and inserts debug symbols if debug is enabled.
- tma_group_ commit 
- Commit an async tensor operation. Not sure how this works, poor docs. But you need to call it after a write, but not after reads.
- tma_group_ wait 
- Wait until at most max_pendingTMA copy operations are in flight.
- tma_group_ wait_ read 
- Wait TMA copy operations have finished reading from shared memory, with at most max_pendingoperations being unfinished.
- tma_store_ 1d 
- Copy a tile from a shared memory srcto a global memorydst, with the provided offsets. Should be combined with [memcpy_async_tensor_commit] and [memcpy_async_tensor_wait_read].
- tma_store_ 2d 
- Copy a tile from a shared memory srcto a global memorydst, with the provided offsets. Should be combined with [memcpy_async_tensor_commit] and [memcpy_async_tensor_wait_read].
- tma_store_ 3d 
- Copy a tile from a shared memory srcto a global memorydst, with the provided offsets. Should be combined with [memcpy_async_tensor_commit] and [memcpy_async_tensor_wait_read].
- tma_store_ 4d 
- Copy a tile from a shared memory srcto a global memorydst, with the provided offsets. Should be combined with [memcpy_async_tensor_commit] and [memcpy_async_tensor_wait_read].
- tma_store_ 5d 
- Copy a tile from a shared memory srcto a global memorydst, with the provided offsets. Should be combined with [memcpy_async_tensor_commit] and [memcpy_async_tensor_wait_read].
- unary_expand 
- unary_expand_ fixed_ output 
Type Aliases§
- FloatExpand 
- A fake float element type that can be configured to map to any other element type.
- NumericExpand 
- A fake numeric element type that can be configured to map to any other element type.
- SliceMut