Skip to main content

Crate cudaforge

Crate cudaforge 

Source
Expand description

§CudaForge

Advanced CUDA kernel builder for Rust with incremental builds, auto-detection, and external dependency support.

§Features

  • Compute Capability Detection: Auto-detect from nvidia-smi or environment, with per-file overrides for mixed architectures
  • Incremental Builds: Only recompile modified kernels using content hashing
  • CUDA Toolkit Auto-Detection: Automatically find nvcc and include paths
  • External Dependencies: Built-in CUTLASS support, or fetch any git repo
  • Parallel Compilation: Configurable thread percentage for parallel builds
  • Flexible Source Selection: Directory, glob, files, or exclude patterns

§Quick Start

use cudaforge::KernelBuilder;

fn main() {
    let out_dir = std::env::var("OUT_DIR").expect("OUT_DIR must be set");
     
    KernelBuilder::new()
        .source_dir("src/kernels")
        .exclude(&["*_test.cu"])
        .arg("-O3")
        .arg("-std=c++17")
        .thread_percentage(0.5)
        .build_lib(format!("{}/libkernels.a", out_dir))
        .expect("CUDA compilation failed");
     
    println!("cargo:rustc-link-search={}", out_dir);
    println!("cargo:rustc-link-lib=kernels");
}

§Per-Kernel Compute Capability

use cudaforge::KernelBuilder;

KernelBuilder::new()
    .source_glob("src/**/*.cu")
    .with_compute_override("sm90_*.cu", 90)  // Hopper kernels
    .with_compute_override("sm80_*.cu", 80)  // Ampere kernels
    .build_lib("libkernels.a")?;

§With CUTLASS

use cudaforge::KernelBuilder;

KernelBuilder::new()
    .source_dir("src/kernels")
    .with_cutlass(Some("7127592069c2fe01b041e174ba4345ef9b279671"))
    .arg("-DUSE_CUTLASS")
    .build_lib("libkernels.a")?;

§PTX Generation

use cudaforge::KernelBuilder;

let output = KernelBuilder::new()
    .source_glob("src/**/*.cu")
    .build_ptx()?;

output.write("src/kernels.rs")?;

Structs§

BuildCache
Build cache for tracking file modifications
ComputeCapability
Compute capability configuration
CudaToolkit
CUDA toolkit information
DependencyManager
Dependency manager for handling multiple external dependencies
ExternalDependency
External dependency configuration
GpuArch
GPU architecture specification
KernelBuilder
Main builder for CUDA kernel compilation
ParallelConfig
Parallel build configuration
PtxOutput
Output from PTX compilation
SourceSelector
Source file selection configuration

Enums§

Error
Errors that can occur during CUDA kernel building

Functions§

collect_headers
Collect header files (.cuh) from directories
detect_compute_cap
Detect compute capability from system
get_gpu_arch_string
Get GPU architecture string for nvcc (e.g., “sm_90a” or “sm_80”)
resolve_cutlass_from_cargo_checkouts
Try to resolve CUTLASS from cargo checkouts directory

Type Aliases§

Bindings
Convenience alias for PTX output
Builder
Convenience alias for the main builder type
Result
Result type alias for CudaForge operations