Crate cubecl_core

Crate cubecl_core 

Source

Re-exports§

pub use frontend::cmma;
pub use cubecl_ir as ir;
pub use codegen::*;

Modules§

backtrace
Backtrace module to build error reports.
benchmark
Module for benchmark timings
bytes
Utilities module to manipulate bytes.
cache
Cache module for an efficient in-memory and persistent database.
client
Compute client module.
codegen
compute
device
Device module.
format
Format utilities.
frontend
Cube Frontend Types.
future
Some future utilities that work across environments. Future utils with a compatible API for native, non-std and wasm environments.
io
Input Output utilities.
map
Map utilities and implementations.
post_processing
prelude
profile
Module for profiling any executable part
quant
Quantization primitives required outside of cubecl-quant
rand
Rand module contains types for random number generation for non-std environments and for std environments.
reader
Useful when you need to read async data without having to decorate each function with async notation.
server
Compute server module.
stream_id
Stream id related utilities.
stub
Stub module contains types for stubs for non-std environments and for std environments.
tune
Autotune module

Macros§

comment
Insert a literal comment into the kernel source code.
comptime
Mark the contents of this macro as compile time values, turning off all expansion for this code and using it verbatim
comptime_type
Makes the function return a compile time value Useful in a cube trait to have a part of the trait return comptime values
debug_print
Print a formatted message using the target’s debug print facilities. The format string is target specific, but Vulkan and CUDA both use the C++ conventions. WGSL isn’t currently supported.
debug_print_expand
Print a formatted message using the target’s debug print facilities. The format string is target specific, but Vulkan and CUDA both use the C++ conventions. WGSL isn’t currently supported.
intrinsic
Mark the contents of this macro as an intrinsic, turning off all expansion for this code and calling it with the scope
terminate
Terminate the execution of the kernel for the current unit.
unexpanded

Structs§

CubeDim
The number of units across all 3 axis totalling to the number of working units in a cube.
CubeTuneId
ID used to identify a Just-in-Time environment.
MemoryUsage
Amount of memory in use by this allocator and statistics on how much memory is reserved and wasted in total.
e2m1
A 4-bit floating point type with 2 exponent bits and 1 mantissa bit.
e2m3
A 6-bit floating point type with 2 exponent bits and 3 mantissa bits.
e2m1x2
A 4-bit floating point type with 2 exponent bits and 1 mantissa bit. Packed with two elements per value, to allow for conversion to/from bytes. Care must be taken to ensure the shape is adjusted appropriately.
e3m2
A 6-bit floating point type with 3 exponent bits and 2 mantissa bits.
e4m3
A 8-bit floating point type with 4 exponent bits and 3 mantissa bits.
e5m2
A 8-bit floating point type with 5 exponent bits and 2 mantissa bits.
flex32
A floating point type with relaxed precision, minimum f16, max [f32].
tf32
A 19-bit floating point type implementing the tfloat32 format.
ue8m0
An 8-bit unsigned floating point type with 8 exponent bits and no mantissa bits. Used for scaling factors.

Enums§

CompilationError
JIT compilation error.
CubeCount
Specifieds the number of cubes to be dispatched for a kernel.
ExecutionMode
The kind of execution to be performed.
LineSizeError
MemoryConfiguration
High level configuration of memory management.

Traits§

Compiler
Compiles the representation into its own representation that can be formatted into tokens.
CubeElement
The base element trait for the jit backend.
CubeScalar
CubeTask
Kernel trait with the ComputeShader that will be compiled and cached based on the provided id.
Runtime
Runtime for the CubeCL.

Functions§

calculate_cube_count_elemwise
Calculate the number of cubes required to execute an operation where one cube unit is assigned to one element.
tensor_line_size
tensor_line_size_parallel
Find the maximum line size usable for parallel vectorization along the given axis from the supported line sizes or return 1 if vectorization is impossible.
tensor_line_size_perpendicular
Find the maximum line size usable for perpendicular vectorization along the given axis from the supported line sizes or return 1 if vectorization is impossible.
tensor_vectorization_factor
try_tensor_line_size_parallel
Like try_tensor_line_size_parallel but does not assume 1 is supported
try_tensor_line_size_perpendicular
Like tensor_line_size_perpendicular but does not assume 1 is supported

Type Aliases§

RuntimeArg
Runtime arguments to launch a kernel.

Attribute Macros§

cube
Mark a cube function, trait or implementation for expansion.
derive_cube_comptime
Attribute macro to define a type that can be used as a kernel comptime argument This derive Debug, Hash, PartialEq, Eq, Clone, Copy

Derive Macros§

AutotuneKey
Implements display and initialization for autotune keys.
CubeLaunch
Derive macro to define a cube type that is launched with a kernel
CubeType
Derive macro to define a cube type that is not launched