docs.rs failed to build pmetal-lora-0.3.12
Please check the build logs for more information.
See Builds for ideas on how to fix a failed build, or Metadata for how to configure docs.rs builds.
If you believe this is docs.rs' fault, open an issue.

pmetal-lora

LoRA and QLoRA training implementations with Metal acceleration.

Overview

This crate provides efficient Low-Rank Adaptation (LoRA) and Quantized LoRA (QLoRA) training for LLMs on Apple Silicon. It includes architecture-specific optimizations and a dynamic model system for seamless multi-architecture support.

Features

Standard LoRA: Low-rank adaptation with configurable rank and alpha
QLoRA: 4-bit quantized base weights with full-precision adapters
Dynamic Architecture: Auto-detect and load any supported model
Fused Training: Metal-accelerated forward/backward passes (~2x speedup)
Gradient Checkpointing: Memory-efficient training for large models
Sequence Packing: Efficient training on variable-length data

Usage

Basic LoRA Training

use pmetal_lora::{DynamicLoraModel, TrainableModel};
use pmetal_core::LoraConfig;

// Configure LoRA
let config = LoraConfig {
    r: 16,
    alpha: 16.0,
    dropout: 0.0,
    ..Default::default()
};

// Load model with LoRA adapters
let mut model = DynamicLoraModel::from_pretrained("path/to/model", config)?;

// Training loop
for batch in dataloader {
    let logits = model.forward(&batch.input_ids, None)?;
    // Compute loss and backprop...
}

// Save adapters
model.save_lora_weights("output/lora_weights.safetensors")?;

Loading Trained Adapters

// Load base model with LoRA structure
let mut model = DynamicLoraModel::from_pretrained("path/to/model", config)?;

// Load trained adapter weights
model.load_lora_weights("output/lora_weights.safetensors")?;

// Run inference
let logits = model.forward(&input_ids, None)?;

Architecture Support

The following architectures are supported via DynamicLoraModel (auto-detection + loading):

Architecture	LoRA	QLoRA	Notes
Llama (2, 3, 3.1, 3.2, 3.3)	Yes	Yes	Gradient checkpointing supported
Qwen 2 (2, 2.5)	Yes	—	Uses Qwen3 LoRA implementation internally
Qwen 3	Yes	Yes	Gradient checkpointing supported
Qwen 3.5 (Next)	Yes	—	Hybrid GDN + Attention, nested text_config
Mistral (7B, Mixtral 8x7B)	Yes	Yes	Sliding window attention
Gemma (2, 3)	Yes	Yes	GeGLU activation, special RMSNorm
Phi (3, 3.5)	Yes	—	Partial RoPE, fused gate_up

Architectures not listed (Llama 4, Qwen3MoE, DeepSeek, Phi4, Cohere, Granite, NemotronH, StarCoder2, RecurrentGemma, Jamba) return DynamicLoraError::NotImplemented. The generic_lora module provides reusable LoRA attention and MLP components for building custom LoRA models for these architectures.

Configuration

Parameter	Description	Default
`r`	LoRA rank	8
`alpha`	Scaling factor	16.0
`dropout`	Dropout rate	0.0
`target_modules`	Modules to adapt	All attention + MLP

Modules

Module	Description
`dynamic`	`DynamicLoraModel` with auto-detection
`llama_lora`	LLaMA-specific LoRA (also covers Granite, Cohere, StarCoder2)
`qwen3_lora`	Qwen3-specific LoRA
`qwen3_next_lora`	Qwen 3.5 (Next) hybrid LoRA
`mistral_lora`	Mistral-specific LoRA
`gemma_lora`	Gemma-specific LoRA
`phi_lora`	Phi-specific LoRA
`generic_lora`	Generic LoRA for architectures without dedicated implementations
`trainable`	`TrainableModel` trait definition
`arch_config`	Per-architecture LoRA configuration

Performance

Compared to mlx-lm on identical hardware:

Metric	pmetal-lora	mlx-lm
Steps/sec	1.33	0.62
Memory	~10 GB	19 GB

License

MIT OR Apache-2.0

pmetal-lora 0.3.12