Crate mlmf

Crate mlmf 

Source
Expand description

MLMF - Machine Learning Model Files

This crate provides a comprehensive toolkit for working with ML model files across formats. MLMF handles loading, saving, conversion, and dynamic mapping of transformer models from SafeTensors, GGUF, ONNX, PyTorch, AWQ, and other formats. It eliminates code duplication across ML projects by providing:

  • Loading: Memory-efficient loading from multiple formats (SafeTensors, GGUF, ONNX, PyTorch, AWQ)
  • Conversion: Direct format conversion with batch processing and progress tracking
  • Saving: Model serialization to SafeTensors, ONNX, and other formats
  • Smart Mapping: ML-powered tensor name mapping with pluggable oracles
  • Architecture Detection: Automatic model architecture inference
  • Device Management: CUDA validation and optimal device selection
  • Progress Reporting: Configurable progress callbacks and monitoring

§Quick Start

use mlmf::{LoadOptions, loader};
use candle_core::{Device, DType};

let device = Device::cuda_if_available(0).unwrap_or(Device::Cpu);
let options = LoadOptions {
    device: device.clone(),
    dtype: DType::F16,
    use_mmap: true,
    validate_cuda: false,
    progress: Some(mlmf::progress::default_progress()),
    smart_mapping_oracle: None,
};

let loaded = mlmf::loader::load_safetensors("./models/llama-7b", options)?;

§Architecture Detection

MLML automatically detects model architecture from tensor names:

use mlmf::name_mapping::{TensorNameMapper, Architecture};

let tensor_names = vec![
    "model.embed_tokens.weight".to_string(),
    "model.layers.0.self_attn.q_proj.weight".to_string(),
];

let mapper = TensorNameMapper::from_tensor_names(&tensor_names)?;
assert_eq!(mapper.architecture(), Architecture::LLaMA);

Re-exports§

pub use cache::CacheConfig;
pub use cache::CacheConfigBuilder;
pub use cache::CacheStats;
pub use cache::MemoryPressure;
pub use cache::ModelCache;
pub use cached_loader::global_cached_loader;
pub use cached_loader::load_cached;
pub use cached_loader::load_safetensors_cached;
pub use cached_loader::CachedModelLoader;
pub use checkpoint::Checkpoint;
pub use checkpoint::CheckpointManager;
pub use checkpoint::CheckpointMetadata;
pub use checkpoint::CheckpointSaveOptions;
pub use checkpoint::OptimizerState;
pub use config::HFConfig;
pub use config::ModelConfig;
pub use distributed::DeviceType;
pub use distributed::DistributedConfig;
pub use distributed::DistributedConfigBuilder;
pub use distributed::NodeConfig;
pub use distributed::ShardingStrategy;
pub use distributed_core::ClusterStatus;
pub use distributed_core::InferenceRequest;
pub use distributed_core::InferenceResponse;
pub use distributed_core::SimpleDistributedManager;
pub use error::Error;
pub use error::Result;
pub use loader::LoadOptions;
pub use loader::LoadedModel;
pub use lora::LoRAAdapter;
pub use lora::LoRAConfig;
pub use lora::LoRAModel;
pub use lora::LoRAWeights;
pub use metadata::CalibrationMethod;
pub use metadata::ModelMetadata;
pub use metadata::ModelProvenance;
pub use metadata::ModelQuantizationInfo;
pub use metadata::TensorInfo;
pub use metadata::TensorQuantizationInfo;
pub use metadata::TensorStatistics;
pub use model_card::EvaluationInfo;
pub use model_card::MemoryRequirements;
pub use model_card::ModelCard;
pub use model_card::ModelCardGenerator;
pub use model_card::ModelInfo;
pub use model_card::TechnicalSpecs;
pub use model_card::TrainingInfo;
pub use model_card::UsageInfo;
pub use name_mapping::Architecture;
pub use name_mapping::TensorNameMapper;
pub use saver::save_model;
pub use saver::save_safetensors;
pub use saver::ModelSaver;
pub use saver::SaveOptions;
pub use universal_loader::detect_model_format;
pub use universal_loader::is_supported_model;
pub use universal_loader::load_model;
pub use saver::save_gguf;gguf
pub use formats::load_onnx;onnx
pub use formats::ONNXLoadOptions;onnx
pub use formats::ONNXLoader;onnx
pub use formats::ONNXModelInfo;onnx
pub use smart_mapping::ChatBasedOracle;
pub use smart_mapping::MappingContext;
pub use smart_mapping::NameMappingOracle;
pub use smart_mapping::SmartTensorNameMapper;
pub use progress::ProgressEvent;progress
pub use progress::ProgressFn;progress
pub use conversion::convert_batch;
pub use conversion::convert_model;
pub use conversion::ConversionFormat;
pub use conversion::ConversionJob;
pub use conversion::ConversionOptions;
pub use conversion::ConversionResult;
pub use multimodal::AttentionStats;
pub use multimodal::CrossModalAttentionConfig;
pub use multimodal::FusionStrategy;
pub use multimodal::Modality;
pub use multimodal::ModalityConfig;
pub use multimodal::ModalityInput;
pub use multimodal::MultiModalConfig;
pub use multimodal::MultiModalInput;
pub use multimodal::MultiModalOutput;
pub use multimodal::MultiModalProcessor;
pub use multimodal::MultiModalStats;
pub use multimodal::PreprocessingConfig;
pub use multimodal_loader::BasicCrossModalLayer;
pub use multimodal_loader::BasicFusionComponent;
pub use multimodal_loader::CrossModalLayer;
pub use multimodal_loader::FusionComponent;
pub use multimodal_loader::MultiModalLoader;
pub use multimodal_loader::MultiModalModel;
pub use multimodal_loader::MultiModalModelStats;
pub use multimodal_processor::BasicMultiModalProcessor;
pub use multimodal_processor::CrossModalAttention;
pub use multimodal_processor::FusionLayer;

Modules§

cache
Advanced Caching and Memory Management
cached_loader
Cached Model Loader Integration
checkpoint
Training checkpoint management for ML models
config
Model configuration parsing and validation
conversion
Model Conversion API
distributed
Distributed model loading and management for scalable inference.
distributed_core
Core distributed functionality implementation.
distributed_loader
Distributed model loader implementation.
error
Error types for MLML
formats
Format-specific model loaders
loader
High-level model loading API
lora
LoRA (Low-Rank Adaptation) support for ML models
metadata
Model metadata and provenance tracking
mmap_loader
Advanced memory-mapped loading for large ML models
model_card
Model Card Generation
multimodal
Multi-Modal Model Support
multimodal_loader
Multi-Modal Model Loader
multimodal_processor
Multi-Modal Processor Implementation
name_mapping
Tensor name mapping for loading HuggingFace models into custom formats
progress
Progress reporting utilities for model loading operations
saver
Format-agnostic model saving utilities
smart_mapping
Smart tensor name mapping with ML-powered suggestions
universal_loader
validation
Validation utilities for device and model configuration

Enums§

DType
The different types of elements allowed in tensors.
Device
Cpu, Cuda, or Metal

Type Aliases§

VarBuilder
A simple VarBuilder, this is less generic than VarBuilderArgs but should cover most common use cases.