unsloth-rs
Rust implementations of transformer building blocks for LLM inference and fine-tuning.
Overview
unsloth-rs provides Rust implementations of common transformer operations built on the Candle ML framework:
- Multi-head attention with grouped-query attention (GQA) support
- Rotary position embeddings (RoPE)
- RMS normalization
- SwiGLU activation
Status
Version 1.0.0 - Core functionality stable. Current implementations are CPU reference implementations with GPU dispatch that uses Candle's CUDA backend.
Implemented
- ✅ Multi-head attention (CPU reference, Candle CUDA backend)
- ✅ Rotary position embeddings (RoPE)
- ✅ RMS normalization
- ✅ SwiGLU activation
- ✅ Memory estimation utilities
- ✅ Ternary quantization (5-15x compression achieved)
- ✅ Mixed precision training utilities (FP32/FP16/BF16)
- ✅ Benchmarking suite (CPU)
- ✅ 160 passing tests (100% pass rate)
In Progress
- 🚧 Flash Attention CubeCL GPU kernel (Phase 1 complete, Phase 2 ready for RTX 5080 validation)
- 🚧 Ternary GPU kernels (Phase 2-4 implemented, awaiting GPU profiling)
- 🚧 CI/CD pipeline setup
Planned
- ⏳ Gradient checkpointing (configuration exists, implementation planned)
- ⏳ GPU performance validation on RTX 5080/3090 Ti
- ⏳ RoPE, RMSNorm, SwiGLU GPU kernels
- ⏳ Advanced sparsity optimizations
- ⏳ Multi-GPU support
Installation
[]
= "1.0.0"
For CUDA support (uses Candle's CUDA backend):
[]
= { = "1.0.0", = ["cuda"] }
Usage
Attention
use ;
use ;
Memory Estimation
use ;
Benchmarks
Run benchmarks with:
Benchmarks test CPU performance across various configurations. GPU benchmarks require the cuda feature.
Development Roadmap
For detailed development plans and task breakdowns, see:
- ROADMAP.md - Strategic development plan with phases and timelines
- TASKS.md - Actionable task list with priorities and estimates
- SUMMARY.md - Project review summary and execution guide
Contributing
Contributions are welcome, particularly:
- GPU kernel implementations using CubeCL
- Performance optimizations
- Additional transformer operations
See TASKS.md for specific tasks that need implementation.
License
Licensed under the MIT License. See LICENSE for details.