pmetal 0.3.2

High-performance LLM fine-tuning framework for Apple Silicon
docs.rs failed to build pmetal-0.3.2
Please check the build logs for more information.
See Builds for ideas on how to fix a failed build, or Metadata for how to configure docs.rs builds.
If you believe this is docs.rs' fault, open an issue.

pmetal

Powdered Metal — High-performance LLM fine-tuning framework for Apple Silicon, written in Rust.

Crates.io docs.rs License

This is the umbrella crate that re-exports all PMetal sub-crates behind feature flags. Add a single dependency to access the full framework:

[dependencies]
pmetal = "0.3"                                      # default features
pmetal = { version = "0.3", features = ["full"] }   # everything

Quick Start

Fine-tune a model

#[tokio::main]
async fn main() -> Result<(), Box<dyn std::error::Error>> {
    let result = pmetal::easy::finetune("Qwen/Qwen3-0.6B", "data.jsonl")
        .lora(16, 32.0)
        .epochs(3)
        .learning_rate(2e-4)
        .output("./output")
        .run()
        .await?;

    println!("Final loss: {:.4}", result.final_loss);
    Ok(())
}

Run inference

#[tokio::main]
async fn main() -> Result<(), Box<dyn std::error::Error>> {
    let result = pmetal::easy::infer("Qwen/Qwen3-0.6B")
        .lora("./output/lora_weights.safetensors")
        .temperature(0.7)
        .max_tokens(256)
        .generate("What is 2+2?")
        .await?;

    println!("{}", result.text);
    Ok(())
}

Query device info

fn main() {
    println!("{}", pmetal::version::device_info());
}

Feature Flags

Feature Crate Default Description
core pmetal-core yes Foundation types, configs, traits
gguf pmetal-gguf yes GGUF format with imatrix quantization
metal pmetal-metal yes Custom Metal GPU kernels + ANE runtime
hub pmetal-hub yes HuggingFace Hub integration
mlx pmetal-mlx yes MLX backend (KV cache, RoPE, ops)
models pmetal-models yes LLM architectures (Llama, Qwen, DeepSeek, ...)
lora pmetal-lora yes LoRA/QLoRA training
trainer pmetal-trainer yes Training loops (SFT, DPO, GRPO, DAPO)
easy all training/inference yes High-level builder API
ane ANE integration yes Apple Neural Engine direct programming
data pmetal-data no Dataset loading and preprocessing
distill pmetal-distill no Knowledge distillation (cross-vocab)
merge pmetal-merge no Model merging (SLERP, TIES, DARE, ModelStock)
vocoder pmetal-vocoder no BigVGAN neural vocoder
distributed pmetal-distributed no Distributed training (mDNS, Ring All-Reduce)
mhc pmetal-mhc no Manifold-Constrained Hyper-Connections
lora-metal-fused no Fused Metal kernels for ~2x LoRA speedup
full all of the above no Everything

Hardware Support

PMetal auto-detects Apple Silicon capabilities and tunes kernel parameters per device:

  • M1–M5 families (Base, Pro, Max, Ultra)
  • NAX (Neural Accelerators in GPU) on M5/Apple10
  • ANE (Apple Neural Engine) with CPU RMSNorm workaround for fp16 stability
  • UltraFusion multi-die topology detection
  • Tier-based tuning: FlashAttention block sizes, GEMM tile sizes, threadgroup sizes, batch multipliers

Examples

# Device info
cargo run -p pmetal --example device_info

# Fine-tuning (easy API)
cargo run -p pmetal --example finetune_easy --features easy -- \
    --model Qwen/Qwen3-0.6B --dataset data.jsonl

# Inference (easy API)
cargo run -p pmetal --example inference_easy --features easy -- \
    --model Qwen/Qwen3-0.6B --prompt "What is 2+2?"

# Manual fine-tuning (lower-level control)
cargo run -p pmetal --example finetune_manual --features data,lora,trainer

Re-exports

All sub-crates are available as modules:

use pmetal::core;       // pmetal-core
use pmetal::metal;      // pmetal-metal
use pmetal::mlx;        // pmetal-mlx
use pmetal::models;     // pmetal-models
use pmetal::lora;       // pmetal-lora
use pmetal::trainer;    // pmetal-trainer
use pmetal::hub;        // pmetal-hub
use pmetal::gguf;       // pmetal-gguf
use pmetal::prelude::*; // commonly used types from all crates

License

Licensed under either of MIT or Apache-2.0.