CudaForge
Advanced CUDA kernel builder for Rust with incremental builds, auto-detection, and external dependency support.
Features
- 🚀 Incremental Builds - Only recompile modified kernels using content hashing
- 🔍 Auto-Detection - Automatically find CUDA toolkit, nvcc, and compute capability
- 🎯 Per-Kernel Compute Cap - Override compute capability for specific kernels by filename
- 📦 External Dependencies - Built-in CUTLASS support, or fetch any git repo
- ⚡ Parallel Compilation - Configurable thread percentage for parallel builds
- 📁 Flexible Sources - Directory, glob, files, or exclude patterns
Installation
Add to your Cargo.toml:
[]
= "0.1"
Quick Start
Building a Static Library
// build.rs
use ;
Building PTX Files
use KernelBuilder;
Compute Capability
Auto-Detection
CudaForge automatically detects compute capability in this order:
CUDA_COMPUTE_CAPenvironment variable (supports "90", "90a", "100a")nvidia-smi --query-gpu=compute_cap
For sm_90+ architectures, the 'a' suffix is automatically added for async features.
Per-Kernel Override
Override compute capability for specific kernels using filename patterns:
new
.source_dir
.with_compute_override // Hopper (auto → sm_90a)
.with_compute_override // Ampere (sm_80)
.with_compute_override_arch // Explicit arch string
.build_lib?;
String-Based Architecture
For explicit control over GPU architecture including suffix:
new
.compute_cap_arch // Explicit sm_90a
.source_dir
.build_lib?;
Numeric (Auto-Suffix)
new
.compute_cap // Auto-selects sm_90a for 90+
.source_dir
.build_lib?;
Source Selection
Directory (Recursive)
new
.source_dir // All .cu files recursively
Glob Pattern
new
.source_glob
Specific Files
new
.source_files
With Exclusions
new
.source_dir
.exclude
Watch Additional Files
Track header files that should trigger rebuilds:
new
.source_dir
.watch
External Dependencies
CUTLASS Integration
new
.source_dir
.with_cutlass
.arg
.arg
.build_lib?;
Custom Git Repository
Fetch include directories from any git repository:
new
.source_dir
.with_git_dependency
.build_lib?;
include_paths are added to nvcc as -I... include directories.
extra_paths are fetched into the sparse checkout but are not added as include directories automatically. Use them when your build needs additional source trees, generated files, templates, or other repo content beyond headers.
If you need to compile source files from the fetched dependency, you can fetch the checkout root and reference files from there:
let builder = new
.source_dir
.with_git_dependency;
let my_lib_root = builder.fetch_git_dependency?;
let builder = builder.source_files;
Local Include Paths
new
.source_dir
.include_path
.include_path
.build_lib?;
Parallel Compilation
Thread Percentage
Use a percentage of available threads:
new
.thread_percentage // 50% of available threads
.source_dir
.build_lib?;
Maximum Threads
Set an absolute limit:
new
.max_threads // Use at most 8 threads
.source_dir
.build_lib?;
Environment Variables
CUDAFORGE_THREADS- Override thread countRAYON_NUM_THREADS- Alternative for compatibility
Pattern-Based Threading
Enable multiple nvcc threads only for specific files (supports globs):
new
.nvcc_thread_patterns // Use 4 nvcc threads for matching files
.build_lib?;
CUDA Toolkit Detection
CudaForge automatically locates the CUDA toolkit in this order:
NVCCenvironment variablenvccinPATHCUDA_HOME/bin/nvcc/usr/local/cuda/bin/nvcc- Common installation paths
Manual Override
new
.cuda_root
Incremental Builds
Incremental builds are enabled by default. CudaForge tracks:
- File content hashes (SHA-256)
- Compute capability used
- Compiler arguments
To disable:
new
.no_incremental
Full Example
use ;
Multiple Builders
Use multiple builders in sequence for different output types or configurations:
use KernelBuilder;
Environment Variables
| Variable | Description |
|---|---|
CUDA_COMPUTE_CAP |
Default compute capability (e.g., 80, 90) |
NVCC |
Path to nvcc binary |
CUDA_HOME |
CUDA installation root |
NVCC_CCBIN |
C++ compiler for nvcc |
CUDAFORGE_THREADS |
Override thread count |
Docker Builds
[!IMPORTANT] GPU is NOT accessible during
docker build— only duringdocker run --gpus all.
When building CUDA kernels inside a Dockerfile, nvidia-smi cannot be used to auto-detect compute capability. You must explicitly set CUDA_COMPUTE_CAP:
Dockerfile Example
FROM nvidia/cuda:12.8.0-devel-ubuntu22.04
# Install Rust
RUN curl --proto '=https' --tlsv1.2 -sSf https://sh.rustup.rs | sh -s -- -y
# Set compute capability for the build
ARG CUDA_COMPUTE_CAP=90
ENV CUDA_COMPUTE_CAP=${CUDA_COMPUTE_CAP}
# Build with explicit compute cap
WORKDIR /app
COPY . .
RUN cargo build --release
Build for different GPU architectures:
# Build for Hopper (sm_90)
# Build for Blackwell (sm_100)
# Build for Ampere (sm_80)
Fail-Fast Mode
For CI/Docker builds, use require_explicit_compute_cap() to fail immediately if compute capability is not set:
new
.require_explicit_compute_cap? // Fails fast if CUDA_COMPUTE_CAP not set
.source_dir
.build_lib?;
Migration from bindgen_cuda
| Old API | New API |
|---|---|
Builder::default() |
KernelBuilder::new() |
.kernel_paths(vec![...]) |
.source_files(vec![...]) |
.kernel_paths_glob("...") |
.source_glob("...") |
.include_paths(vec![...]) |
.include_path("...") |
Bindings |
PtxOutput |
Backward compatibility aliases are available:
cudaforge::Builder→KernelBuildercudaforge::Bindings→PtxOutput
License
MIT OR Apache-2.0