Expand description
MLMF - Machine Learning Model Files
This crate provides a comprehensive toolkit for working with ML model files across formats. MLMF handles loading, saving, conversion, and dynamic mapping of transformer models from SafeTensors, GGUF, ONNX, PyTorch, AWQ, and other formats. It eliminates code duplication across ML projects by providing:
- Loading: Memory-efficient loading from multiple formats (SafeTensors, GGUF, ONNX, PyTorch, AWQ)
- Conversion: Direct format conversion with batch processing and progress tracking
- Saving: Model serialization to SafeTensors, ONNX, and other formats
- Smart Mapping: ML-powered tensor name mapping with pluggable oracles
- Architecture Detection: Automatic model architecture inference
- Device Management: CUDA validation and optimal device selection
- Progress Reporting: Configurable progress callbacks and monitoring
§Quick Start
use mlmf::{LoadOptions, loader};
use candle_core::{Device, DType};
let device = Device::cuda_if_available(0).unwrap_or(Device::Cpu);
let options = LoadOptions {
device: device.clone(),
dtype: DType::F16,
use_mmap: true,
validate_cuda: false,
progress: Some(mlmf::progress::default_progress()),
smart_mapping_oracle: None,
};
let loaded = mlmf::loader::load_safetensors("./models/llama-7b", options)?;§Architecture Detection
MLML automatically detects model architecture from tensor names:
use mlmf::name_mapping::{TensorNameMapper, Architecture};
let tensor_names = vec![
"model.embed_tokens.weight".to_string(),
"model.layers.0.self_attn.q_proj.weight".to_string(),
];
let mapper = TensorNameMapper::from_tensor_names(&tensor_names)?;
assert_eq!(mapper.architecture(), Architecture::LLaMA);Re-exports§
pub use cache::CacheConfig;pub use cache::CacheConfigBuilder;pub use cache::CacheStats;pub use cache::MemoryPressure;pub use cache::ModelCache;pub use cached_loader::global_cached_loader;pub use cached_loader::load_cached;pub use cached_loader::load_safetensors_cached;pub use cached_loader::CachedModelLoader;pub use checkpoint::Checkpoint;pub use checkpoint::CheckpointManager;pub use checkpoint::CheckpointMetadata;pub use checkpoint::CheckpointSaveOptions;pub use checkpoint::OptimizerState;pub use config::HFConfig;pub use config::ModelConfig;pub use distributed::DeviceType;pub use distributed::DistributedConfig;pub use distributed::DistributedConfigBuilder;pub use distributed::NodeConfig;pub use distributed::ShardingStrategy;pub use distributed_core::ClusterStatus;pub use distributed_core::InferenceRequest;pub use distributed_core::InferenceResponse;pub use distributed_core::SimpleDistributedManager;pub use error::Error;pub use error::Result;pub use loader::LoadOptions;pub use loader::LoadedModel;pub use lora::LoRAAdapter;pub use lora::LoRAConfig;pub use lora::LoRAModel;pub use lora::LoRAWeights;pub use metadata::CalibrationMethod;pub use metadata::ModelMetadata;pub use metadata::ModelProvenance;pub use metadata::ModelQuantizationInfo;pub use metadata::TensorInfo;pub use metadata::TensorQuantizationInfo;pub use metadata::TensorStatistics;pub use model_card::EvaluationInfo;pub use model_card::MemoryRequirements;pub use model_card::ModelCard;pub use model_card::ModelCardGenerator;pub use model_card::ModelInfo;pub use model_card::TechnicalSpecs;pub use model_card::TrainingInfo;pub use model_card::UsageInfo;pub use name_mapping::Architecture;pub use name_mapping::TensorNameMapper;pub use saver::save_model;pub use saver::save_safetensors;pub use saver::ModelSaver;pub use saver::SaveOptions;pub use universal_loader::detect_model_format;pub use universal_loader::is_supported_model;pub use universal_loader::load_model;pub use saver::save_gguf;ggufpub use formats::load_onnx;onnxpub use formats::ONNXLoadOptions;onnxpub use formats::ONNXLoader;onnxpub use formats::ONNXModelInfo;onnxpub use smart_mapping::ChatBasedOracle;pub use smart_mapping::MappingContext;pub use smart_mapping::NameMappingOracle;pub use smart_mapping::SmartTensorNameMapper;pub use progress::ProgressEvent;progresspub use progress::ProgressFn;progresspub use conversion::convert_batch;pub use conversion::convert_model;pub use conversion::ConversionFormat;pub use conversion::ConversionJob;pub use conversion::ConversionOptions;pub use conversion::ConversionResult;pub use multimodal::AttentionStats;pub use multimodal::CrossModalAttentionConfig;pub use multimodal::FusionStrategy;pub use multimodal::Modality;pub use multimodal::ModalityConfig;pub use multimodal::ModalityInput;pub use multimodal::MultiModalConfig;pub use multimodal::MultiModalInput;pub use multimodal::MultiModalOutput;pub use multimodal::MultiModalProcessor;pub use multimodal::MultiModalStats;pub use multimodal::PreprocessingConfig;pub use multimodal_loader::BasicCrossModalLayer;pub use multimodal_loader::BasicFusionComponent;pub use multimodal_loader::CrossModalLayer;pub use multimodal_loader::FusionComponent;pub use multimodal_loader::MultiModalLoader;pub use multimodal_loader::MultiModalModel;pub use multimodal_loader::MultiModalModelStats;pub use multimodal_processor::BasicMultiModalProcessor;pub use multimodal_processor::CrossModalAttention;pub use multimodal_processor::FusionLayer;
Modules§
- cache
- Advanced Caching and Memory Management
- cached_
loader - Cached Model Loader Integration
- checkpoint
- Training checkpoint management for ML models
- config
- Model configuration parsing and validation
- conversion
- Model Conversion API
- distributed
- Distributed model loading and management for scalable inference.
- distributed_
core - Core distributed functionality implementation.
- distributed_
loader - Distributed model loader implementation.
- error
- Error types for MLML
- formats
- Format-specific model loaders
- loader
- High-level model loading API
- lora
- LoRA (Low-Rank Adaptation) support for ML models
- metadata
- Model metadata and provenance tracking
- mmap_
loader - Advanced memory-mapped loading for large ML models
- model_
card - Model Card Generation
- multimodal
- Multi-Modal Model Support
- multimodal_
loader - Multi-Modal Model Loader
- multimodal_
processor - Multi-Modal Processor Implementation
- name_
mapping - Tensor name mapping for loading HuggingFace models into custom formats
- progress
- Progress reporting utilities for model loading operations
- saver
- Format-agnostic model saving utilities
- smart_
mapping - Smart tensor name mapping with ML-powered suggestions
- universal_
loader - validation
- Validation utilities for device and model configuration
Enums§
Type Aliases§
- VarBuilder
- A simple
VarBuilder, this is less generic thanVarBuilderArgsbut should cover most common use cases.