Skip to main content

Crate ferrum_models

Crate ferrum_models 

Source
Expand description

Ferrum 模型层

该 crate 负责围绕 ferrum-interfaces/ferrum-types 定义的核心抽象 提供模型定义解析、构建器与权重加载占位实现,确保上层可以在 重构阶段编译。

Re-exports§

pub use common::DecoderOnlyLLM;
pub use common::LlmRuntimeConfig;
pub use definition::ConfigManager;
pub use definition::ModelDefinition;
pub use executor::BertModelExecutor;
pub use executor::ClipModelExecutor;
pub use executor::LlmExecutor;
pub use executor::StubModelExecutor;
pub use executor::TtsModelExecutor;
pub use executor::WhisperModelExecutor;
pub use hf_download::HfDownloader;
pub use image_processor::ClipImageProcessor;
pub use loader::SafeTensorsLoader;
pub use lora::default_lora_model_id;
pub use lora::load_runtime_lora_adapter;
pub use lora::load_startup_lora_adapter;
pub use lora::load_startup_lora_adapters;
pub use lora::render_lora_model_id;
pub use lora::ActiveLoraAdapter;
pub use lora::LoraAdapterConfig;
pub use lora::RuntimeLoraAdapter;
pub use lora::StartupLoraAdapter;
pub use lora::StartupLoraSpec;
pub use multimodal::BertModelWrapper;
pub use multimodal::ClipModelWrapper;
pub use multimodal::WhisperModelWrapper;
pub use registry::Architecture;
pub use registry::DefaultModelRegistry;
pub use registry::ModelAlias;
pub use registry::ModelDiscoveryEntry;
pub use registry::ModelFormatType;
pub use source::DefaultModelSourceResolver;
pub use source::ModelFormat;
pub use source::ModelSourceConfig;
pub use source::ModelSourceResolver;
pub use source::ResolvedModelSource;
pub use tensor_wrapper::CandleTensorWrapper;
pub use tokenizer::TokenizerFactory;
pub use tokenizer::TokenizerHandle;

Modules§

audio_processor
Audio preprocessing for Whisper ASR.
common
Cross-model traits and helpers (Model-as-Code shared infrastructure).
definition
Model definition and configuration parsing
executor
Model executor implementations.
gguf_config
Build a LlamaFamilyConfig from a GGUF file’s metadata.
gguf_engine_loader
GGUF → engine glue. Lets ferrum serve / ferrum bench accept a .gguf path (or an alias resolving to one) and produce a Box<dyn DecoderOnlyLLM> that the existing LlmExecutor + ContinuousBatchEngine can drive.
gguf_runtime
Thin dispatch wrapper around candle-transformers’ quantized GGUF model loaders. Exists so the rest of ferrum can hand a .gguf path to something that produces a Tensor from token ids without caring whether the underlying arch is Qwen3 / Qwen3-MoE / Llama-3.x / Mistral.
hf_download
HuggingFace model downloader with proxy and resume support
image_processor
Image preprocessing for CLIP models.
loader
Weight loading from SafeTensors files
lora
Startup LoRA adapter loader and validator.
mel
Mel spectrogram computation matching Python whisper exactly.
models
Model-as-Code implementations.
moe
Mixture-of-Experts runtime primitives.
moe_config
Mixture-of-Experts (MoE) configuration types.
multimodal
Legacy architecture wrappers (candle-based, non-decoder model types).
registry
Model registry and alias management
source
Model source resolution and downloading with progress tracking
tensor_wrapper
Candle Tensor wrapper implementing TensorLike
tokenizer
Tokenizer 占位实现
weight_format
Dim 3 polymorphism point — weight-format detection for the executor factory.

Structs§

AttentionConfig
Attention configuration for model architecture
ModelConfig
Model configuration for runtime
ModelInfo
Model information and metadata
RopeScaling
RoPE (Rotary Position Embedding) scaling configuration

Enums§

Activation
Activation function type
ModelType
Model type enumeration
NormType
Normalization type used in the model

Traits§

ModelExecutor
Core model executor trait focusing on tensor operations

Type Aliases§

Result
Result type used throughout Ferrum