Crate unsloth_rs

Expand description

§unsloth-rs

Rust implementations of transformer building blocks for LLM inference and fine-tuning.

§What This Crate Provides

This crate provides common transformer operations built on Candle:

Multi-head attention: Core attention mechanism with grouped-query attention (GQA) support
Rotary position embeddings (RoPE): Position encoding used in modern LLMs
RMS normalization: Efficient normalization layer used in LLaMA-style models
SwiGLU activation: Gated activation function for transformer MLPs
Memory estimation utilities: Tools for tracking and estimating memory usage

§Why This Crate?

This crate provides a Rust-native implementation of transformer components, offering type safety and memory safety guarantees. The implementations are designed to be clear and maintainable, serving as reference implementations that can be extended with optimized GPU kernels.

§Current Status

Current implementations are CPU reference implementations with GPU dispatch via Candle’s CUDA backend. Fused GPU kernels using CubeCL are planned for future versions.

§Quick Start

use unsloth_rs::kernels::{FusedAttention, FusedAttentionConfig};
use candle_core::Device;

let config = FusedAttentionConfig {
    hidden_size: 768,
    num_heads: 12,
    head_dim: 64,
    ..Default::default()
};
let attention = FusedAttention::new(config, &Device::Cpu)?;

Re-exports§

pub use error::Result;
pub use error::UnslothError;

Modules§

error: Error types for unsloth-rs.
kernels: Optimized GPU kernels.
memory: Memory management utilities for tracking allocations.
training: Training utilities.

Crate unsloth_rs

Crate unsloth_rs Copy item path

§unsloth-rs

§What This Crate Provides

§Why This Crate?

§Current Status

§Quick Start

Re-exports§

Modules§

Crate unsloth_rs