CudaForge
Advanced CUDA kernel builder for Rust with incremental builds, auto-detection, and external dependency support.
Features
- 🚀 Incremental Builds - Only recompile modified kernels using content hashing
- 🔍 Auto-Detection - Automatically find CUDA toolkit, nvcc, and compute capability
- 🎯 Per-Kernel Compute Cap - Override compute capability for specific kernels by filename
- 📦 External Dependencies - Built-in CUTLASS support, or fetch any git repo
- ⚡ Parallel Compilation - Configurable thread percentage for parallel builds
- 📁 Flexible Sources - Directory, glob, files, or exclude patterns
Installation
Add to your Cargo.toml:
[]
= "0.1"
Quick Start
Building a Static Library
// build.rs
use ;
Building PTX Files
use KernelBuilder;
Compute Capability
Auto-Detection
CudaForge automatically detects compute capability in this order:
CUDA_COMPUTE_CAPenvironment variable (supports "90", "90a", "100a")nvidia-smi --query-gpu=compute_cap
For sm_90+ architectures, the 'a' suffix is automatically added for async features.
Per-Kernel Override
Override compute capability for specific kernels using filename patterns:
new
.source_dir
.with_compute_override // Hopper (auto → sm_90a)
.with_compute_override // Ampere (sm_80)
.with_compute_override_arch // Explicit arch string
.build_lib?;
String-Based Architecture
For explicit control over GPU architecture including suffix:
new
.compute_cap_arch // Explicit sm_90a
.source_dir
.build_lib?;
Numeric (Auto-Suffix)
new
.compute_cap // Auto-selects sm_90a for 90+
.source_dir
.build_lib?;
Source Selection
Directory (Recursive)
new
.source_dir // All .cu files recursively
Glob Pattern
new
.source_glob
Specific Files
new
.source_files
With Exclusions
new
.source_dir
.exclude
Watch Additional Files
Track header files that should trigger rebuilds:
new
.source_dir
.watch
External Dependencies
CUTLASS Integration
new
.source_dir
.with_cutlass
.arg
.arg
.build_lib?;
Custom Git Repository
Fetch include directories from any git repository:
new
.source_dir
.with_git_dependency
.build_lib?;
Local Include Paths
new
.source_dir
.include_path
.include_path
.build_lib?;
Parallel Compilation
Thread Percentage
Use a percentage of available threads:
new
.thread_percentage // 50% of available threads
.source_dir
.build_lib?;
Maximum Threads
Set an absolute limit:
new
.max_threads // Use at most 8 threads
.source_dir
.build_lib?;
Environment Variables
CUDAFORGE_THREADS- Override thread countRAYON_NUM_THREADS- Alternative for compatibilityNVCC_THREADS- Number of threads for nvcc internal parallelism
CUDA Toolkit Detection
CudaForge automatically locates the CUDA toolkit in this order:
NVCCenvironment variablenvccinPATHCUDA_HOME/bin/nvcc/usr/local/cuda/bin/nvcc- Common installation paths
Manual Override
new
.cuda_root
Incremental Builds
Incremental builds are enabled by default. CudaForge tracks:
- File content hashes (SHA-256)
- Compute capability used
- Compiler arguments
To disable:
new
.no_incremental
Full Example
use ;
Multiple Builders
Use multiple builders in sequence for different output types or configurations:
use KernelBuilder;
Environment Variables
| Variable | Description |
|---|---|
CUDA_COMPUTE_CAP |
Default compute capability (e.g., 80, 90) |
NVCC |
Path to nvcc binary |
CUDA_HOME |
CUDA installation root |
NVCC_CCBIN |
C++ compiler for nvcc |
CUDAFORGE_THREADS |
Override thread count |
NVCC_THREADS |
nvcc internal parallel threads |
Docker Builds
[!IMPORTANT] GPU is NOT accessible during
docker build— only duringdocker run --gpus all.
When building CUDA kernels inside a Dockerfile, nvidia-smi cannot be used to auto-detect compute capability. You must explicitly set CUDA_COMPUTE_CAP:
Dockerfile Example
FROM nvidia/cuda:12.8.0-devel-ubuntu22.04
# Install Rust
RUN curl --proto '=https' --tlsv1.2 -sSf https://sh.rustup.rs | sh -s -- -y
# Set compute capability for the build
ARG CUDA_COMPUTE_CAP=90
ENV CUDA_COMPUTE_CAP=${CUDA_COMPUTE_CAP}
# Build with explicit compute cap
WORKDIR /app
COPY . .
RUN cargo build --release
Build for different GPU architectures:
# Build for Hopper (sm_90)
# Build for Blackwell (sm_100)
# Build for Ampere (sm_80)
Fail-Fast Mode
For CI/Docker builds, use require_explicit_compute_cap() to fail immediately if compute capability is not set:
new
.require_explicit_compute_cap? // Fails fast if CUDA_COMPUTE_CAP not set
.source_dir
.build_lib?;
Migration from bindgen_cuda
| Old API | New API |
|---|---|
Builder::default() |
KernelBuilder::new() |
.kernel_paths(vec![...]) |
.source_files(vec![...]) |
.kernel_paths_glob("...") |
.source_glob("...") |
.include_paths(vec![...]) |
.include_path("...") |
Bindings |
PtxOutput |
Backward compatibility aliases are available:
cudaforge::Builder→KernelBuildercudaforge::Bindings→PtxOutput
License
MIT OR Apache-2.0