Skip to main content

Crate ferrotorch_cubecl

Crate ferrotorch_cubecl 

Source
Expand description

Portable GPU backend for ferrotorch via CubeCL.

CubeCL compiles a single kernel definition to CUDA PTX, AMD HIP/ROCm, and WGPU (Vulkan/Metal/DX12). This crate wraps CubeCL’s runtime and dispatches real #[cube] kernels to the active backend — no CPU fallbacks.

§Feature flags

FeatureBackendGPU vendors
cudaNVIDIA CUDA via PTXNVIDIA
wgpuWGPU (Vulkan/Metal)AMD, Intel, Apple, …
rocmAMD HIP (native)AMD

Enable at least one backend feature to use GPU acceleration. Without any backend feature CubeRuntime::new returns FerrotorchError::DeviceUnavailable and CubeRuntime::auto returns None.

§Example

use ferrotorch_cubecl::{CubeDevice, CubeRuntime};

// Auto-detect the best available backend
if let Some(rt) = CubeRuntime::auto() {
    println!("Using device: {:?}", rt.device());
}

§REQ status (per .design/ferrotorch-cubecl/lib.md)

Full evidence rows (impl + non-test production consumer + upstream cites) live in the design doc; this synopsis is a one-line summary per REQ.

REQStatusEvidence
REQ-1 (public module surface)SHIPPEDpub mod grammar/kernels/ops/quant/runtime/storage in lib.rs; consumer ferrotorch-xpu/src/lib.rs imports ferrotorch_cubecl::{CubeDevice, CubeRuntime, upload_f32, wrap_kernel_output}
REQ-2 (feature-flag wiring)SHIPPEDcuda/wgpu/rocm feature gates in Cargo.toml + make_client cfg arms in runtime.rs; no-backend path pinned by runtime_construction_errors_without_backend in ops.rs
REQ-3 (boundary re-exports)SHIPPEDpub use runtime::* / storage::* / quant::* / grammar::* in lib.rs; consumers ferrotorch-xpu/src/lib.rs + ferrotorch-grammar/src/gpu_dispatch.rs reach names via ferrotorch_cubecl::Foo
REQ-4 (crate-internal launch helpers)SHIPPEDpub(crate) fn elementwise_launch_dims + pub(crate) fn debug_assert_handle_capacity in lib.rs; consumers kernels::run_unary/run_binary_handle, quant::dequantize_q4_0_to_gpu, grammar::compute_token_mask_dfa_to_gpu
REQ-5 (lint baseline)SHIPPED#![warn(clippy::all, clippy::pedantic)] + #![deny(rust_2018_idioms, missing_debug_implementations)] at top of lib.rs; verified by cargo clippy -p ferrotorch-cubecl --no-default-features -- -D warnings

Re-exports§

pub use runtime::CubeClient;
pub use runtime::CubeDevice;
pub use runtime::CubeRuntime;
pub use storage::CubeclStorageHandle;
pub use storage::cubecl_handle_of;
pub use storage::upload_f32;
pub use storage::wrap_kernel_output;
pub use quant::GgufBlockKind;
pub use quant::dequantize_q4_0_to_gpu;
pub use quant::dequantize_q4_1_to_gpu;
pub use quant::dequantize_q5_0_to_gpu;
pub use quant::dequantize_q5_1_to_gpu;
pub use quant::dequantize_q8_0_to_gpu;
pub use quant::dequantize_q8_1_to_gpu;
pub use quant::split_q4_0_blocks;
pub use quant::split_q4_1_blocks;
pub use quant::split_q5_0_blocks;
pub use quant::split_q5_1_blocks;
pub use quant::split_q8_0_blocks;
pub use quant::split_q8_1_blocks;
pub use grammar::DfaMaskInputs;
pub use grammar::compute_token_mask_dfa_to_gpu;
pub use grammar::kernel_compute_token_mask_dfa;
pub use grammar::kernel_compute_token_mask_dfa;

Modules§

grammar
GPU constrained-decoding token-mask computation.
kernels
CubeCL kernel definitions used by ferrotorch-cubecl.
ops
Portable GPU operations that dispatch through a real CubeCL [ComputeClient] and run #[cube] kernels on the selected backend.
quant
GGUF quantized-weight dequantization on the GPU.
runtime
Unified runtime selection for CubeCL backends.
storage
Concrete CubeStorageHandle implementation for ferrotorch-cubecl.