pmetal-lora 0.3.12

LoRA and QLoRA implementations for PMetal
docs.rs failed to build pmetal-lora-0.3.12
Please check the build logs for more information.
See Builds for ideas on how to fix a failed build, or Metadata for how to configure docs.rs builds.
If you believe this is docs.rs' fault, open an issue.

pmetal-lora

LoRA and QLoRA training implementations with Metal acceleration.

Overview

This crate provides efficient Low-Rank Adaptation (LoRA) and Quantized LoRA (QLoRA) training for LLMs on Apple Silicon. It includes architecture-specific optimizations and a dynamic model system for seamless multi-architecture support.

Features

  • Standard LoRA: Low-rank adaptation with configurable rank and alpha
  • QLoRA: 4-bit quantized base weights with full-precision adapters
  • Dynamic Architecture: Auto-detect and load any supported model
  • Fused Training: Metal-accelerated forward/backward passes (~2x speedup)
  • Gradient Checkpointing: Memory-efficient training for large models
  • Sequence Packing: Efficient training on variable-length data

Usage

Basic LoRA Training

use pmetal_lora::{DynamicLoraModel, TrainableModel};
use pmetal_core::LoraConfig;

// Configure LoRA
let config = LoraConfig {
    r: 16,
    alpha: 16.0,
    dropout: 0.0,
    ..Default::default()
};

// Load model with LoRA adapters
let mut model = DynamicLoraModel::from_pretrained("path/to/model", config)?;

// Training loop
for batch in dataloader {
    let logits = model.forward(&batch.input_ids, None)?;
    // Compute loss and backprop...
}

// Save adapters
model.save_lora_weights("output/lora_weights.safetensors")?;

Loading Trained Adapters

// Load base model with LoRA structure
let mut model = DynamicLoraModel::from_pretrained("path/to/model", config)?;

// Load trained adapter weights
model.load_lora_weights("output/lora_weights.safetensors")?;

// Run inference
let logits = model.forward(&input_ids, None)?;

Architecture Support

The following architectures are supported via DynamicLoraModel (auto-detection + loading):

Architecture LoRA QLoRA Notes
Llama (2, 3, 3.1, 3.2, 3.3) Yes Yes Gradient checkpointing supported
Qwen 2 (2, 2.5) Yes Uses Qwen3 LoRA implementation internally
Qwen 3 Yes Yes Gradient checkpointing supported
Qwen 3.5 (Next) Yes Hybrid GDN + Attention, nested text_config
Mistral (7B, Mixtral 8x7B) Yes Yes Sliding window attention
Gemma (2, 3) Yes Yes GeGLU activation, special RMSNorm
Phi (3, 3.5) Yes Partial RoPE, fused gate_up

Architectures not listed (Llama 4, Qwen3MoE, DeepSeek, Phi4, Cohere, Granite, NemotronH, StarCoder2, RecurrentGemma, Jamba) return DynamicLoraError::NotImplemented. The generic_lora module provides reusable LoRA attention and MLP components for building custom LoRA models for these architectures.

Configuration

Parameter Description Default
r LoRA rank 8
alpha Scaling factor 16.0
dropout Dropout rate 0.0
target_modules Modules to adapt All attention + MLP

Modules

Module Description
dynamic DynamicLoraModel with auto-detection
llama_lora LLaMA-specific LoRA (also covers Granite, Cohere, StarCoder2)
qwen3_lora Qwen3-specific LoRA
qwen3_next_lora Qwen 3.5 (Next) hybrid LoRA
mistral_lora Mistral-specific LoRA
gemma_lora Gemma-specific LoRA
phi_lora Phi-specific LoRA
generic_lora Generic LoRA for architectures without dedicated implementations
trainable TrainableModel trait definition
arch_config Per-architecture LoRA configuration

Performance

Compared to mlx-lm on identical hardware:

Metric pmetal-lora mlx-lm
Steps/sec 1.33 0.62
Memory ~10 GB 19 GB

License

MIT OR Apache-2.0