Expand description
Portable GPU backend for ferrotorch via CubeCL.
CubeCL compiles a single kernel definition to CUDA PTX, AMD HIP/ROCm, and
WGPU (Vulkan/Metal/DX12). This crate wraps CubeCL’s runtime and dispatches
real #[cube] kernels to the active backend — no CPU fallbacks.
§Feature flags
| Feature | Backend | GPU vendors |
|---|---|---|
cuda | NVIDIA CUDA via PTX | NVIDIA |
wgpu | WGPU (Vulkan/Metal) | AMD, Intel, Apple, … |
rocm | AMD HIP (native) | AMD |
Enable at least one backend feature to use GPU acceleration. Without any
backend feature CubeRuntime::new returns
FerrotorchError::DeviceUnavailable and CubeRuntime::auto returns
None.
§Example
use ferrotorch_cubecl::{CubeDevice, CubeRuntime};
// Auto-detect the best available backend
if let Some(rt) = CubeRuntime::auto() {
println!("Using device: {:?}", rt.device());
}§REQ status (per .design/ferrotorch-cubecl/lib.md)
Full evidence rows (impl + non-test production consumer + upstream cites) live in the design doc; this synopsis is a one-line summary per REQ.
| REQ | Status | Evidence |
|---|---|---|
| REQ-1 (public module surface) | SHIPPED | pub mod grammar/kernels/ops/quant/runtime/storage in lib.rs; consumer ferrotorch-xpu/src/lib.rs imports ferrotorch_cubecl::{CubeDevice, CubeRuntime, upload_f32, wrap_kernel_output} |
| REQ-2 (feature-flag wiring) | SHIPPED | cuda/wgpu/rocm feature gates in Cargo.toml + make_client cfg arms in runtime.rs; no-backend path pinned by runtime_construction_errors_without_backend in ops.rs |
| REQ-3 (boundary re-exports) | SHIPPED | pub use runtime::* / storage::* / quant::* / grammar::* in lib.rs; consumers ferrotorch-xpu/src/lib.rs + ferrotorch-grammar/src/gpu_dispatch.rs reach names via ferrotorch_cubecl::Foo |
| REQ-4 (crate-internal launch helpers) | SHIPPED | pub(crate) fn elementwise_launch_dims + pub(crate) fn debug_assert_handle_capacity in lib.rs; consumers kernels::run_unary/run_binary_handle, quant::dequantize_q4_0_to_gpu, grammar::compute_token_mask_dfa_to_gpu |
| REQ-5 (lint baseline) | SHIPPED | #![warn(clippy::all, clippy::pedantic)] + #![deny(rust_2018_idioms, missing_debug_implementations)] at top of lib.rs; verified by cargo clippy -p ferrotorch-cubecl --no-default-features -- -D warnings |
Re-exports§
pub use runtime::CubeClient;pub use runtime::CubeDevice;pub use runtime::CubeRuntime;pub use storage::CubeclStorageHandle;pub use storage::cubecl_handle_of;pub use storage::upload_f32;pub use storage::wrap_kernel_output;pub use quant::GgufBlockKind;pub use quant::dequantize_q4_0_to_gpu;pub use quant::dequantize_q4_1_to_gpu;pub use quant::dequantize_q5_0_to_gpu;pub use quant::dequantize_q5_1_to_gpu;pub use quant::dequantize_q8_0_to_gpu;pub use quant::dequantize_q8_1_to_gpu;pub use quant::split_q4_0_blocks;pub use quant::split_q4_1_blocks;pub use quant::split_q5_0_blocks;pub use quant::split_q5_1_blocks;pub use quant::split_q8_0_blocks;pub use quant::split_q8_1_blocks;pub use grammar::DfaMaskInputs;pub use grammar::compute_token_mask_dfa_to_gpu;pub use grammar::kernel_compute_token_mask_dfa;pub use grammar::kernel_compute_token_mask_dfa;
Modules§
- grammar
- GPU constrained-decoding token-mask computation.
- kernels
- CubeCL kernel definitions used by
ferrotorch-cubecl. - ops
- Portable GPU operations that dispatch through a real CubeCL
[
ComputeClient] and run#[cube]kernels on the selected backend. - quant
- GGUF quantized-weight dequantization on the GPU.
- runtime
- Unified runtime selection for CubeCL backends.
- storage
- Concrete
CubeStorageHandleimplementation for ferrotorch-cubecl.