pmetal-mlx 0.1.0

MLX backend implementation for PMetal LLM fine-tuning
docs.rs failed to build pmetal-mlx-0.1.0
Please check the build logs for more information.
See Builds for ideas on how to fix a failed build, or Metadata for how to configure docs.rs builds.
If you believe this is docs.rs' fault, open an issue.

pmetal-mlx

MLX backend integration with advanced training utilities.

Overview

This crate provides the bridge between PMetal and Apple's MLX framework, along with custom implementations for training utilities not available in the base MLX library.

Features

  • Quantization: NF4, FP4, Int8 implementations
  • Gradient Checkpointing: Memory-efficient training for large models
  • KV Cache: Efficient key-value caching for inference
  • Mixture of Experts: MoE layer implementations
  • NEFTune: Noise injection for improved fine-tuning
  • Sequence Packing: Efficient batching for variable-length sequences
  • Speculative Decoding: Faster inference with draft models

Usage

use pmetal_mlx::prelude::*;

// Create a KV cache for inference
let cache = KVCache::new(num_layers, batch_size, max_seq_len, head_dim);

// Use sequence packing for training
let packed = SequencePacker::pack(&sequences, max_length)?;

Modules

Module Description
kernels Custom MLX kernels (cross entropy, RMS norm, etc.)
quantization Weight quantization implementations
gradient_checkpoint Memory-efficient gradient computation
kv_cache Key-value cache for efficient inference
moe Mixture of Experts support
neftune NEFTune noise injection
sequence_packing Efficient sequence batching
speculative Speculative decoding utilities

Quantization Formats

Format Bits Memory Savings Quality
NF4 4 75% High
FP4 4 75% Medium
Int8 8 50% Very High

License

MIT OR Apache-2.0