Expand description
Cube Frontend Types.
Re-exports§
pub use branch::RangeExpand;
pub use branch::SteppedRangeExpand;
pub use branch::range;
pub use branch::range;
pub use branch::range_stepped;
pub use branch::range_stepped;
Modules§
- ABSOLUTE_
POS - The position of the working unit in the whole cube kernel, without regards to cubes and axis.
- ABSOLUTE_
POS_ X - The index of the working unit in the whole cube kernel along the X axis, without regards to cubes.
- ABSOLUTE_
POS_ Y - The index of the working unit in the whole cube kernel along the Y axis, without regards to cubes.
- ABSOLUTE_
POS_ Z - The index of the working unit in the whole cube kernel along the Z axis, without regards to cubes.
- CUBE_
CLUSTER_ DIM - The total amount of cubes in a cluster.
- CUBE_
CLUSTER_ DIM_ X - The dimension of the cluster along the X axis.
- CUBE_
CLUSTER_ DIM_ Y - The dimension of the cluster along the Y axis.
- CUBE_
CLUSTER_ DIM_ Z - The dimension of the cluster along the Z axis.
- CUBE_
COUNT - The number of cubes launched.
- CUBE_
COUNT_ X - The number of cubes launched along the X axis.
- CUBE_
COUNT_ Y - The number of cubes launched along the Y axis.
- CUBE_
COUNT_ Z - The number of cubes launched along the Z axis.
- CUBE_
DIM - The total amount of working units in a cube.
- CUBE_
DIM_ X - The dimension of the cube along the X axis.
- CUBE_
DIM_ Y - The dimension of the cube along the Y axis.
- CUBE_
DIM_ Z - The dimension of the cube along the Z axis.
- CUBE_
POS - The cube position, without regards to axis.
- CUBE_
POS_ CLUSTER - The cube position within the cluster.
- CUBE_
POS_ CLUSTER_ X - The cube position in the cluster along the X axis.
- CUBE_
POS_ CLUSTER_ Y - The cube position in the cluster along the Y axis.
- CUBE_
POS_ CLUSTER_ Z - The cube position in the cluster along the Z axis.
- CUBE_
POS_ X - The cube position along the X axis.
- CUBE_
POS_ Y - The cube position along the Y axis.
- CUBE_
POS_ Z - The cube position along the Z axis.
- PLANE_
DIM - The total amount of working units in a plane.
- UNIT_
POS - The position of the working unit inside the cube, without regards to axis.
- UNIT_
POS_ PLANE - The relative position of the working unit inside the plane, without regards to cube dimensions.
- UNIT_
POS_ X - The position of the working unit inside the cube along the X axis.
- UNIT_
POS_ Y - The position of the working unit inside the cube along the Y axis.
- UNIT_
POS_ Z - The position of the working unit inside the cube along the Z axis.
- add
- add_
assign - add_
assign_ array_ op - add_
assign_ op - and
- assign
- barrier
- This module exposes barrier for asynchronous data transfer
- bitand
- bitand_
assign_ array_ op - bitand_
assign_ op - bitor
- bitor_
assign_ array_ op - bitor_
assign_ op - bitxor
- bitxor_
assign_ array_ op - bitxor_
assign_ op - branch
- cast
- cmma
- This module exposes cooperative matrix-multiply and accumulate operations.
- comptime_
error - copy_
bulk - cube_
comment - div
- div_
assign_ array_ op - div_
assign_ op - eq
- erf
- ge
- gt
- index
- index_
assign - le
- lt
- mul
- mul_
assign_ array_ op - mul_
assign_ op - ne
- neg
- not
- or
- pipeline
- This module exposes pipelining utilities for multi-stage asynchronous data copies with latency hiding. We call producers all threads that call producer_acquire and producer_commit, and consumers threads that call consumer_wait and consumer_release.
- plane_
all - Module containing the expand function for plane_all().
- plane_
any - Module containing the expand function for plane_any().
- plane_
ballot - Module containing the expand function for plane_ballot().
- plane_
broadcast - Module containing the expand function for plane_broadcast().
- plane_
elect - Module containing the expand function for plane_elect().
- plane_
exclusive_ prod - Module containing the expand function for plane_exclusive_prod().
- plane_
exclusive_ sum - Module containing the expand function for plane_exclusive_sum().
- plane_
inclusive_ prod - Module containing the expand function for plane_inclusive_prod().
- plane_
inclusive_ sum - Module containing the expand function for plane_inclusive_sum().
- plane_
max - Module containing the expand function for plane_max().
- plane_
min - Module containing the expand function for plane_min().
- plane_
prod - Module containing the expand function for plane_prod().
- plane_
sum - Module containing the expand function for plane_sum().
- rem
- rem_
assign_ array_ op - rem_
assign_ op - select
- select_
many - set_
polyfill - Expand module of set_polyfill().
- shl
- shl_
assign_ array_ op - shl_
assign_ op - shr
- shr_
assign_ array_ op - shr_
assign_ op - sub
- sub_
assign_ array_ op - sub_
assign_ op - synchronization
- tma_
group_ commit - tma_
group_ wait - tma_
group_ wait_ read - tma_
store_ 2d - tma_
store_ 3d - tma_
store_ 4d - tma_
store_ 5d
Macros§
- debug_
print - Print a formatted message using the target’s debug print facilities. The format string is target specific, but Vulkan and CUDA both use the C++ conventions. WGSL isn’t currently supported.
- debug_
print_ expand - Print a formatted message using the target’s debug print facilities. The format string is target specific, but Vulkan and CUDA both use the C++ conventions. WGSL isn’t currently supported.
Structs§
- Array
- A contiguous array of elements.
- Array
Compilation Arg - Array
Handle Ref - Tensor representation with a reference to the server handle.
- Atomic
- An atomic numerical type wrapping a normal numeric primitive. Enables the use of atomic operations, while disabling normal operations. In WGSL, this is a separate type - on CUDA/SPIR-V it can theoretically be bitcast to a normal number, but this isn’t recommended.
- Comptime
Cell - A cell that can store and mutate cube type types during comptime.
- Comptime
Cell Expand - Expand type of ComptimeCell.
- Expand
Element Typed - Expand type associated with a type.
- Fast
Math - Unchecked optimizations for float operations. May cause precision differences, or undefined behaviour if the relevant conditions are not followed.
- Float
Expand - IntExpand
- Line
- A contiguous list of elements that supports auto-vectorized operations.
- Registry
- It is similar to a map, but where the keys are stored at comptime, but the values can be runtime variables.
- Scalar
Arg - Sequence
- A sequence of cube types that is inlined during compilation.
- Sequence
Arg - Sequence
Compilation Arg - Sequence
Expand - Expand type of Sequence.
- Shared
Memory - Slice
- A read-only contiguous list of elements
- Slice
Mut - A read-write contiguous list of elements.
- Tensor
- The tensor type is similar to the array type, however it comes with more metadata such as stride and shape.
- Tensor
Compilation Arg - Compilation argument for a tensor.
- Tensor
Handle Ref - Tensor representation with a reference to the server handle, the strides and the shape.
- Tensor
Map - A CUDA
CUtensorMap
object. Represents a tensor encoded with a lot of metadata, and is an opaque packed object at runtime. Does not support retrieving any shapes or strides, nor does it give access to the pointer. So these need to be passed separately in an aliasedTensor
if needed. - Tensor
MapArg - Grid constant tensor map, currently only maps to CUDA tensormap. May be interleaved or swizzled, but last dimension must be contiguous (since strides don’t include the last dimension).
- Tensor
MapCompilation Arg - Compilation argument for a tensor map.
Enums§
- Array
Arg - OobFill
- What value to use when filling out of bounds values
- Tensor
Arg - Argument to be used for tensors passed as arguments to kernels.
- Tensor
MapFormat - Format of [
TensorMap
] - Tensor
MapInterleave - Interleave setting for [
TensorMap
] - Tensor
MapPrefetch - Additional prefetching to perform during load Specifies L2 fetch size which indicates the byte granularity at which L2 requests are filled from DRAM
- Tensor
MapSwizzle - Data are organized in a specific order in global memory; however, this may not match the order
in which the application accesses data in shared memory. This difference in data organization
may cause bank conflicts when shared memory is accessed. In order to avoid this problem, data
can be loaded to shared memory with shuffling across shared memory banks. When interleave is
TensorMapInterleave::B32
, swizzle must beTensorMapSwizzle::B32
. Other interleave modes can have any swizzling pattern.
Constants§
- ABSOLUTE_
POS - The position of the working unit in the whole cube kernel, without regards to cubes and axis.
- ABSOLUTE_
POS_ X - The index of the working unit in the whole cube kernel along the X axis, without regards to cubes.
- ABSOLUTE_
POS_ Y - The index of the working unit in the whole cube kernel along the Y axis, without regards to cubes.
- ABSOLUTE_
POS_ Z - The index of the working unit in the whole cube kernel along the Z axis, without regards to cubes.
- CUBE_
CLUSTER_ DIM - The total amount of cubes in a cluster.
- CUBE_
CLUSTER_ DIM_ X - The dimension of the cluster along the X axis.
- CUBE_
CLUSTER_ DIM_ Y - The dimension of the cluster along the Y axis.
- CUBE_
CLUSTER_ DIM_ Z - The dimension of the cluster along the Z axis.
- CUBE_
COUNT - The number of cubes launched.
- CUBE_
COUNT_ X - The number of cubes launched along the X axis.
- CUBE_
COUNT_ Y - The number of cubes launched along the Y axis.
- CUBE_
COUNT_ Z - The number of cubes launched along the Z axis.
- CUBE_
DIM - The total amount of working units in a cube.
- CUBE_
DIM_ X - The dimension of the cube along the X axis.
- CUBE_
DIM_ Y - The dimension of the cube along the Y axis.
- CUBE_
DIM_ Z - The dimension of the cube along the Z axis.
- CUBE_
POS - The cube position, without regards to axis.
- CUBE_
POS_ CLUSTER - The cube position within the cluster.
- CUBE_
POS_ CLUSTER_ X - The cube position in the cluster along the X axis.
- CUBE_
POS_ CLUSTER_ Y - The cube position in the cluster along the Y axis.
- CUBE_
POS_ CLUSTER_ Z - The cube position in the cluster along the Z axis.
- CUBE_
POS_ X - The cube position along the X axis.
- CUBE_
POS_ Y - The cube position along the Y axis.
- CUBE_
POS_ Z - The cube position along the Z axis.
- PLANE_
DIM - The total amount of working units in a plane.
- UNIT_
POS - The position of the working unit inside the cube, without regards to axis.
- UNIT_
POS_ PLANE - The relative position of the working unit inside the plane, without regards to cube dimensions.
- UNIT_
POS_ X - The position of the working unit inside the cube along the X axis.
- UNIT_
POS_ Y - The position of the working unit inside the cube along the Y axis.
- UNIT_
POS_ Z - The position of the working unit inside the cube along the Z axis.
Traits§
- Abs
- ArgSettings
- Defines the argument settings used to launch a kernel.
- Bitwise
Not - BoolOps
- Extension trait for bool.
- Cast
- Enable elegant casting from any to any CubeElem
- Ceil
- Clamp
- Compilation
Arg - Argument used during the compilation of kernels.
- Cos
- Count
Ones - Cube
Comptime - A type that can be used as a kernel comptime argument.
Note that a type doesn’t need to implement
CubeComptime
to be used as a comptime argument. However, this facilitate the declaration of generic cube types. - Cube
Debug - Cube
Index - Fake indexation so we can rewrite indexes into scalars as calls to this fake function in the non-expanded function
- Cube
Index Mut - Cube
Launch - A CubeType that can be used as a kernel argument such as [Array] or [Tensor].
- Cube
Primitive - Form of CubeType that encapsulates all primitive types: Numeric, UInt, Bool
- Cube
Type - Types used in a cube function must implement this trait
- Dot
- Erf
- Exp
- Expand
Element Base Init - Find
First Set - Float
- Floating point numbers. Used as input in float kernels
- Floor
- Index
- Init
- Trait to be implemented by cube types implementations.
- Int
- Signed or unsigned integer. Used as input in int kernels
- Into
Runtime - Trait useful to convert a comptime value into runtime value.
- Launch
Arg - Defines a type that can be used as argument to a kernel.
- Launch
ArgExpand - Defines how a launch argument can be expanded.
- Leading
Zeros - List
- Type from which we can read values in cube functions. For a mutable version, see ListMut.
- List
Expand - Expand version of [CubeRead].
- ListMut
- Type for which we can read and write values in cube functions. For an immutable version, see List.
- List
MutExpand - Expand version of [CubeWrite].
- Log
- Log1p
- Magnitude
- Max
- Min
- MulHi
- Normalize
- Numeric
- Type that encompasses both (unsigned or signed) integers and floats Used in kernels that should work for both.
- Option
Ext - Powf
- Recip
- Registry
Query - To find an item from the registry, the query must be able to be translated to the actual key type.
- Reinterpret
- Enables reinterpetring the bits from any value to any other type of the same size.
- Remainder
- Reverse
Bits - Round
- Scalar
ArgSettings - Similar to ArgSettings, however only for scalar types that don’t depend on the Runtime trait.
- Sin
- Sized
Container - Slice
Operator - Slice
Operator Expand - Sqrt
- Tanh
Functions§
- array_
assign_ binary_ op_ expand - copy_
bulk - Bulk copy
length
elements between two array-likes without intermediates. - debug_
call_ expand - Calls a function and inserts debug symbols if debug is enabled.
- debug_
source_ expand - Adds source instruction if debug is enabled
- debug_
var_ expand - Registers name for an expand if possible
- erf
- expand_
checked_ index_ assign - expand_
erf - expand_
himul_ 64 - expand_
himul_ sim - fma
- Fused multiply-add
A*B+C
. - fma_
expand - Expand method of fma.
- init_
expand - plane_
all - Perform a reduce all operation across all units in a plane.
- plane_
any - Perform a reduce any operation across all units in a plane.
- plane_
ballot - Perform a ballot operation across all units in a plane.
Returns a set of 32-bit bitfields as a
Line
, with each element containing the value from 32 invocations. Note that line size will always be set to 4 even forPLANE_DIM <= 64
, because we can’t retrieve the actual plane size at expand time. Use the runtimePLANE_DIM
to index appropriately. - plane_
broadcast - Broadcasts the value from the specified plane unit at the given index to all active units within that plane.
- plane_
elect - Returns true if the cube unit has the lowest plane_unit_id among active unit in the plane
- plane_
exclusive_ prod - Perform an exclusive product operation across all units in a plane.
This multiplies all values to the “left” of the unit, excluding this unit’s value. The 0th unit
will be set to
E::one()
. Also known as “exclusive prefix product” or “exclusive scan”. - plane_
exclusive_ sum - Perform an exclusive sum operation across all units in a plane.
This sums all values to the “left” of the unit, excluding this unit’s value. The 0th unit will
be set to
E::zero()
. Also known as “exclusive prefix sum” or “exclusive scan”. - plane_
inclusive_ prod - Perform an inclusive product operation across all units in a plane. This multiplies all values to the “left” of the unit, including this unit’s value. Also known as “prefix product” or “inclusive scan”.
- plane_
inclusive_ sum - Perform an inclusive sum operation across all units in a plane. This sums all values to the “left” of the unit, including this unit’s value. Also known as “prefix sum” or “inclusive scan”.
- plane_
max - Perform a reduce max operation across all units in a plane.
- plane_
min - Perform a reduce min operation across all units in a plane.
- plane_
prod - Perform a reduce prod operation across all units in a plane.
- plane_
sum - Perform a reduce sum operation across all units in a plane.
- printf_
expand - Prints a formatted message using the print debug layer in Vulkan, or
printf
in CUDA. - select
- Executes both branches, then selects a value based on the condition. This should be branchless, but might depend on the compiler.
- select_
many - Same as select() but with lines instead.
- set_
polyfill - Change the meaning of the given cube primitive type during compilation.
- slice_
expand - spanned_
expand - Calls an intrinsic op and inserts debug symbols if debug is enabled.
- tma_
group_ commit - Commit an async tensor operation. Not sure how this works, poor docs. But you need to call it after a write, but not after reads.
- tma_
group_ wait - Wait until at most
max_pending
TMA copy operations are in flight. - tma_
group_ wait_ read - Wait TMA copy operations have finished reading from shared memory, with at most
max_pending
operations being unfinished. - tma_
store_ 2d - Copy a tile from a shared memory
src
to a global memorydst
, with the provided offsets. Should be combined with [memcpy_async_tensor_commit
] and [memcpy_async_tensor_wait_read
]. - tma_
store_ 3d - Copy a tile from a shared memory
src
to a global memorydst
, with the provided offsets. Should be combined with [memcpy_async_tensor_commit
] and [memcpy_async_tensor_wait_read
]. - tma_
store_ 4d - Copy a tile from a shared memory
src
to a global memorydst
, with the provided offsets. Should be combined with [memcpy_async_tensor_commit
] and [memcpy_async_tensor_wait_read
]. - tma_
store_ 5d - Copy a tile from a shared memory
src
to a global memorydst
, with the provided offsets. Should be combined with [memcpy_async_tensor_commit
] and [memcpy_async_tensor_wait_read
]. - unary_
expand - unary_
expand_ fixed_ output