# pmetal
**Powdered Metal** — High-performance LLM fine-tuning framework for Apple Silicon, written in Rust.
[](https://crates.io/crates/pmetal)
[](https://docs.rs/pmetal)
[](../../LICENSE)
This is the umbrella crate that re-exports all PMetal sub-crates behind feature flags. Add a single dependency to access the full framework:
```toml
[dependencies]
pmetal = "0.3" # default features
pmetal = { version = "0.3", features = ["full"] } # everything
```
## Quick Start
### Fine-tune a model
```rust,no_run
#[tokio::main]
async fn main() -> Result<(), Box<dyn std::error::Error>> {
let result = pmetal::easy::finetune("Qwen/Qwen3-0.6B", "data.jsonl")
.lora(16, 32.0)
.epochs(3)
.learning_rate(2e-4)
.output("./output")
.run()
.await?;
println!("Final loss: {:.4}", result.final_loss);
Ok(())
}
```
### Run inference
```rust,no_run
#[tokio::main]
async fn main() -> Result<(), Box<dyn std::error::Error>> {
let result = pmetal::easy::infer("Qwen/Qwen3-0.6B")
.lora("./output/lora_weights.safetensors")
.temperature(0.7)
.max_tokens(256)
.generate("What is 2+2?")
.await?;
println!("{}", result.text);
Ok(())
}
```
### Query device info
```rust,no_run
fn main() {
println!("{}", pmetal::version::device_info());
}
```
## Feature Flags
| `core` | `pmetal-core` | yes | Foundation types, configs, traits |
| `gguf` | `pmetal-gguf` | yes | GGUF format with imatrix quantization |
| `metal` | `pmetal-metal` | yes | Custom Metal GPU kernels + ANE runtime |
| `hub` | `pmetal-hub` | yes | HuggingFace Hub integration |
| `mlx` | `pmetal-mlx` | yes | MLX backend (KV cache, RoPE, ops) |
| `models` | `pmetal-models` | yes | LLM architectures (Llama, Qwen, DeepSeek, ...) |
| `lora` | `pmetal-lora` | yes | LoRA/QLoRA training |
| `trainer` | `pmetal-trainer` | yes | Training loops (SFT, DPO, GRPO, DAPO) |
| `easy` | all training/inference | yes | High-level builder API |
| `ane` | ANE integration | yes | Apple Neural Engine direct programming |
| `data` | `pmetal-data` | no | Dataset loading and preprocessing |
| `distill` | `pmetal-distill` | no | Knowledge distillation (cross-vocab) |
| `merge` | `pmetal-merge` | no | Model merging (SLERP, TIES, DARE, ModelStock) |
| `vocoder` | `pmetal-vocoder` | no | BigVGAN neural vocoder |
| `distributed` | `pmetal-distributed` | no | Distributed training (mDNS, Ring All-Reduce) |
| `mhc` | `pmetal-mhc` | no | Manifold-Constrained Hyper-Connections |
| `lora-metal-fused` | — | no | Fused Metal kernels for ~2x LoRA speedup |
| `full` | all of the above | no | Everything |
## Hardware Support
PMetal auto-detects Apple Silicon capabilities and tunes kernel parameters per device:
- **M1–M5** families (Base, Pro, Max, Ultra)
- **NAX** (Neural Accelerators in GPU) on M5/Apple10
- **ANE** (Apple Neural Engine) with CPU RMSNorm workaround for fp16 stability
- **UltraFusion** multi-die topology detection
- **Tier-based tuning**: FlashAttention block sizes, GEMM tile sizes, threadgroup sizes, batch multipliers
## Examples
```sh
# Device info
cargo run -p pmetal --example device_info
# Fine-tuning (easy API)
cargo run -p pmetal --example finetune_easy --features easy -- \
--model Qwen/Qwen3-0.6B --dataset data.jsonl
# Inference (easy API)
cargo run -p pmetal --example inference_easy --features easy -- \
--model Qwen/Qwen3-0.6B --prompt "What is 2+2?"
# Manual fine-tuning (lower-level control)
cargo run -p pmetal --example finetune_manual --features data,lora,trainer
```
## Re-exports
All sub-crates are available as modules:
```rust
use pmetal::core; // pmetal-core
use pmetal::metal; // pmetal-metal
use pmetal::mlx; // pmetal-mlx
use pmetal::models; // pmetal-models
use pmetal::lora; // pmetal-lora
use pmetal::trainer; // pmetal-trainer
use pmetal::hub; // pmetal-hub
use pmetal::gguf; // pmetal-gguf
use pmetal::prelude::*; // commonly used types from all crates
```
## License
Licensed under either of [MIT](../../LICENSE-MIT) or [Apache-2.0](../../LICENSE-APACHE).