trueno-gpu
Pure Rust PTX generation for NVIDIA CUDA - no LLVM, no nvcc, no external dependencies.
Philosophy
Own the Stack - Build everything from first principles for complete control, auditability, and reproducibility.
Features
- Pure Rust PTX Generation: Generate PTX assembly directly from Rust code
- No External Dependencies: No LLVM, nvcc, or CUDA toolkit required for code generation
- Builder Pattern API: Ergonomic API for constructing PTX modules and kernels
- Hand-Optimized Kernels: Pre-built kernels for common ML operations
Quick Start
use ;
// Build a vector addition kernel
let module = new
.version
.target
.address_size;
let ptx_source = module.emit;
assert!;
Available Kernels
| Kernel | Description |
|---|---|
| GEMM | Matrix multiplication (naive, tiled, tensor core) |
| Softmax | Numerically stable softmax with warp shuffle |
| LayerNorm | Fused layer normalization |
| Attention | FlashAttention-style tiled attention |
| Quantize | Q4_K dequantization fused with matmul |
Usage
use ;
// Create a tiled GEMM kernel
let kernel = tiled;
let ptx = kernel.emit_ptx;
// The PTX can be loaded by CUDA driver API
println!;
Modules
ptx- PTX code generation (builder pattern)kernels- Hand-optimized GPU kernelsdriver- CUDA driver API (minimal FFI, optional)memory- GPU memory managementbackend- Multi-backend abstraction
Requirements
- Rust 1.70+
- For GPU execution: NVIDIA CUDA driver (optional, only needed to run generated PTX)
License
MIT License - see LICENSE for details.
Part of Trueno
This crate is part of the Trueno high-performance compute library.