Crate cubecl_core

Re-exports§

backtrace: Backtrace module to build error reports.
benchmark: Module for benchmark timings
bytes: Utilities module to manipulate bytes.
cache: Cache module for an efficient in-memory and persistent database.
client: Compute client module.
codegen
compute
device: Device module.
format: Format utilities.
frontend: Cube Frontend Types.
future: Some future utilities that work across environments. Future utils with a compatible API for native, non-std and wasm environments.
io: Input Output utilities.
map: Map utilities and implementations.
post_processing
prelude
profile: Module for profiling any executable part
quant: Quantization primitives required outside of cubecl-quant
rand: Rand module contains types for random number generation for non-std environments and for std environments.
reader: Useful when you need to read async data without having to decorate each function with async notation.
server: Compute server module.
stream_id: Stream id related utilities.
stub: Stub module contains types for stubs for non-std environments and for std environments.
tune: Autotune module

comment: Insert a literal comment into the kernel source code.
comptime: Mark the contents of this macro as compile time values, turning off all expansion for this code and using it verbatim
comptime_type: Makes the function return a compile time value Useful in a cube trait to have a part of the trait return comptime values
debug_print: Print a formatted message using the target’s debug print facilities. The format string is target specific, but Vulkan and CUDA both use the C++ conventions. WGSL isn’t currently supported.
debug_print_expand: Print a formatted message using the target’s debug print facilities. The format string is target specific, but Vulkan and CUDA both use the C++ conventions. WGSL isn’t currently supported.
intrinsic: Mark the contents of this macro as an intrinsic, turning off all expansion for this code and calling it with the scope
terminate: Terminate the execution of the kernel for the current unit.
unexpanded

CubeDim: The number of units across all 3 axis totalling to the number of working units in a cube.
CubeTuneId: ID used to identify a Just-in-Time environment.
MemoryUsage: Amount of memory in use by this allocator and statistics on how much memory is reserved and wasted in total.
e2m1: A 4-bit floating point type with 2 exponent bits and 1 mantissa bit.
e2m3: A 6-bit floating point type with 2 exponent bits and 3 mantissa bits.
e2m1x2: A 4-bit floating point type with 2 exponent bits and 1 mantissa bit. Packed with two elements per value, to allow for conversion to/from bytes. Care must be taken to ensure the shape is adjusted appropriately.
e3m2: A 6-bit floating point type with 3 exponent bits and 2 mantissa bits.
e4m3: A 8-bit floating point type with 4 exponent bits and 3 mantissa bits.
e5m2: A 8-bit floating point type with 5 exponent bits and 2 mantissa bits.
flex32: A floating point type with relaxed precision, minimum f16, max [f32].
tf32: A 19-bit floating point type implementing the tfloat32 format.
ue8m0: An 8-bit unsigned floating point type with 8 exponent bits and no mantissa bits. Used for scaling factors.

Compiler: Compiles the representation into its own representation that can be formatted into tokens.
CubeElement: The base element trait for the jit backend.
CubeScalar
CubeTask: Kernel trait with the ComputeShader that will be compiled and cached based on the provided id.
Runtime: Runtime for the CubeCL.

calculate_cube_count_elemwise: Calculate the number of cubes required to execute an operation where one cube unit is assigned to one element.
tensor_line_size
tensor_line_size_parallel: Find the maximum line size usable for parallel vectorization along the given axis from the supported line sizes or return 1 if vectorization is impossible.
tensor_line_size_perpendicular: Find the maximum line size usable for perpendicular vectorization along the given axis from the supported line sizes or return 1 if vectorization is impossible.
tensor_vectorization_factor
try_tensor_line_size_parallel: Like try_tensor_line_size_parallel but does not assume 1 is supported
try_tensor_line_size_perpendicular: Like tensor_line_size_perpendicular but does not assume 1 is supported

cube: Mark a cube function, trait or implementation for expansion.
derive_cube_comptime: Attribute macro to define a type that can be used as a kernel comptime argument This derive Debug, Hash, PartialEq, Eq, Clone, Copy

AutotuneKey: Implements display and initialization for autotune keys.
CubeLaunch: Derive macro to define a cube type that is launched with a kernel
CubeType: Derive macro to define a cube type that is not launched