Expand description
§CudaForge
Advanced CUDA kernel builder for Rust with incremental builds, auto-detection, and external dependency support.
§Features
- Compute Capability Detection: Auto-detect from nvidia-smi or environment, with per-file overrides for mixed architectures
- Incremental Builds: Only recompile modified kernels using content hashing
- CUDA Toolkit Auto-Detection: Automatically find nvcc and include paths
- External Dependencies: Built-in CUTLASS support, or fetch any git repo
- Parallel Compilation: Configurable thread percentage for parallel builds
- Flexible Source Selection: Directory, glob, files, or exclude patterns
§Quick Start
use cudaforge::KernelBuilder;
fn main() {
let out_dir = std::env::var("OUT_DIR").expect("OUT_DIR must be set");
KernelBuilder::new()
.source_dir("src/kernels")
.exclude(&["*_test.cu"])
.arg("-O3")
.arg("-std=c++17")
.thread_percentage(0.5)
.build_lib(format!("{}/libkernels.a", out_dir))
.expect("CUDA compilation failed");
println!("cargo:rustc-link-search={}", out_dir);
println!("cargo:rustc-link-lib=kernels");
}§Per-Kernel Compute Capability
use cudaforge::KernelBuilder;
KernelBuilder::new()
.source_glob("src/**/*.cu")
.with_compute_override("sm90_*.cu", 90) // Hopper kernels
.with_compute_override("sm80_*.cu", 80) // Ampere kernels
.build_lib("libkernels.a")?;§With CUTLASS
use cudaforge::KernelBuilder;
KernelBuilder::new()
.source_dir("src/kernels")
.with_cutlass(Some("7127592069c2fe01b041e174ba4345ef9b279671"))
.arg("-DUSE_CUTLASS")
.build_lib("libkernels.a")?;§PTX Generation
use cudaforge::KernelBuilder;
let output = KernelBuilder::new()
.source_glob("src/**/*.cu")
.build_ptx()?;
output.write("src/kernels.rs")?;Structs§
- Build
Cache - Build cache for tracking file modifications
- Compute
Capability - Compute capability configuration
- Cuda
Toolkit - CUDA toolkit information
- Dependency
Manager - Dependency manager for handling multiple external dependencies
- External
Dependency - External dependency configuration
- GpuArch
- GPU architecture specification
- Kernel
Builder - Main builder for CUDA kernel compilation
- Parallel
Config - Parallel build configuration
- PtxOutput
- Output from PTX compilation
- Source
Selector - Source file selection configuration
Enums§
- Error
- Errors that can occur during CUDA kernel building
Functions§
- collect_
headers - Collect header files (.cuh) from directories
- detect_
compute_ cap - Detect compute capability from system
- get_
gpu_ arch_ string - Get GPU architecture string for nvcc (e.g., “sm_90a” or “sm_80”)
- resolve_
cutlass_ from_ cargo_ checkouts - Try to resolve CUTLASS from cargo checkouts directory