pmetal-py 0.3.11

Python bindings for PMetal — high-performance LLM fine-tuning on Apple Silicon
docs.rs failed to build pmetal-py-0.3.11
Please check the build logs for more information.
See Builds for ideas on how to fix a failed build, or Metadata for how to configure docs.rs builds.
If you believe this is docs.rs' fault, open an issue.

pmetal-py

Python bindings for PMetal via PyO3.

Overview

This crate provides a cdylib Python extension module exposing PMetal's training, inference, and model management APIs to Python. Built with PyO3 and installable via maturin.

Installation

cd crates/pmetal-py
pip install maturin
maturin develop --release

Quick Start

Easy API

import pmetal

# Fine-tune with sensible defaults
result = pmetal.finetune(
    "Qwen/Qwen3-0.6B",
    "train.jsonl",
    lora_r=16,
    learning_rate=2e-4,
    epochs=3,
)
print(f"Loss: {result['final_loss']}, Steps: {result['total_steps']}")

# Inference
text = pmetal.infer("Qwen/Qwen3-0.6B", "What is 2+2?")
print(text)

Full Control

import pmetal

# Configure training
lora_config = pmetal.LoraConfig(r=16, alpha=32.0)
training_config = pmetal.TrainingConfig(
    learning_rate=2e-4,
    num_epochs=3,
    batch_size=4,
    max_seq_len=2048,
)

# Create and run trainer
trainer = pmetal.Trainer(
    model_id="Qwen/Qwen3-0.6B",
    lora_config=lora_config,
    training_config=training_config,
    dataset_path="train.jsonl",
)
trainer.add_callback(pmetal.ProgressCallback())
result = trainer.train()

# Load model for inference
model = pmetal.Model.load("Qwen/Qwen3-0.6B")
print(model.generate("Hello world", temperature=0.7))

API Reference

Module-Level Functions

Function Description
finetune(model_id, dataset_path, ...) Fine-tune with sensible defaults
infer(model_id, prompt, ...) Run inference
download_model(model_id, ...) Download from HuggingFace Hub
download_file(url, ...) Download a file

Classes

Class Description
Model Model loading and inference (Model.load(), model.generate())
Trainer Training orchestration (Trainer(...), trainer.train())
Tokenizer Tokenization (Tokenizer.from_file())
LoraConfig LoRA configuration (r, alpha, dropout, use_rslora, use_dora)
TrainingConfig Training hyperparameters (learning_rate, batch_size, num_epochs, ...)
GenerationConfig Generation parameters (max_tokens, temperature, top_k, top_p, ...)
DataLoaderConfig Data loading parameters (batch_size, max_seq_len, shuffle, ...)
ProgressCallback Progress bar callback
LoggingCallback Logging callback
MetricsJsonCallback JSONL metrics callback

Enums

Enum Values
Dtype Float32, Float16, BFloat16
Quantization None, NF4, FP4, Int8
LoraBias None, All, LoraOnly
LrSchedulerType Constant, Linear, Cosine, CosineWithRestarts, Polynomial
OptimizerType AdamW, Adam8bit, ScheduleFree
DatasetFormat Auto, Simple, Alpaca, ShareGpt, OpenAi, Reasoning
ModelArchitecture Llama, Qwen2, Qwen3, Gemma, Mistral, Phi, ...

Modules

Module Description
easy Top-level finetune() and infer() functions
model Model class with load() and generate()
trainer Trainer class with train() and callbacks
config Configuration classes and enums
callbacks Training callback classes
tokenizer Tokenizer class
hub HuggingFace Hub download functions
array_bridge MLX array to Python conversion
error Error handling and conversion

License

MIT OR Apache-2.0