Crate ferrum_models

Source

Expand description

Ferrum 模型层

该 crate 负责围绕 ferrum-interfaces/ferrum-types 定义的核心抽象提供模型定义解析、构建器与权重加载占位实现，确保上层可以在重构阶段编译。

Re-exports§

pub use common::DecoderOnlyLLM;
pub use common::LlmRuntimeConfig;
pub use definition::ConfigManager;
pub use definition::ModelDefinition;
pub use executor::BertModelExecutor;
pub use executor::ClipModelExecutor;
pub use executor::LlmExecutor;
pub use executor::StubModelExecutor;
pub use executor::TtsModelExecutor;
pub use executor::WhisperModelExecutor;
pub use hf_download::HfDownloader;
pub use image_processor::ClipImageProcessor;
pub use loader::SafeTensorsLoader;
pub use lora::default_lora_model_id;
pub use lora::load_runtime_lora_adapter;
pub use lora::load_startup_lora_adapter;
pub use lora::load_startup_lora_adapters;
pub use lora::render_lora_model_id;
pub use lora::ActiveLoraAdapter;
pub use lora::LoraAdapterConfig;
pub use lora::RuntimeLoraAdapter;
pub use lora::StartupLoraAdapter;
pub use lora::StartupLoraSpec;
pub use multimodal::BertModelWrapper;
pub use multimodal::ClipModelWrapper;
pub use multimodal::WhisperModelWrapper;
pub use registry::Architecture;
pub use registry::DefaultModelRegistry;
pub use registry::ModelAlias;
pub use registry::ModelDiscoveryEntry;
pub use registry::ModelFormatType;
pub use source::DefaultModelSourceResolver;
pub use source::ModelFormat;
pub use source::ModelSourceConfig;
pub use source::ModelSourceResolver;
pub use source::ResolvedModelSource;
pub use tensor_wrapper::CandleTensorWrapper;
pub use tokenizer::TokenizerFactory;
pub use tokenizer::TokenizerHandle;

Modules§

audio_processor: Audio preprocessing for Whisper ASR.
common: Cross-model traits and helpers (Model-as-Code shared infrastructure).
definition: Model definition and configuration parsing
executor: Model executor implementations.
gguf_config: Build a LlamaFamilyConfig from a GGUF file’s metadata.
gguf_engine_loader: GGUF → engine glue. Lets ferrum serve / ferrum bench accept a .gguf path (or an alias resolving to one) and produce a Box<dyn DecoderOnlyLLM> that the existing LlmExecutor + ContinuousBatchEngine can drive.
gguf_runtime: Thin dispatch wrapper around candle-transformers’ quantized GGUF model loaders. Exists so the rest of ferrum can hand a .gguf path to something that produces a Tensor from token ids without caring whether the underlying arch is Qwen3 / Qwen3-MoE / Llama-3.x / Mistral.
hf_download: HuggingFace model downloader with proxy and resume support
image_processor: Image preprocessing for CLIP models.
loader: Weight loading from SafeTensors files
lora: Startup LoRA adapter loader and validator.
mel: Mel spectrogram computation matching Python whisper exactly.
models: Model-as-Code implementations.
moe: Mixture-of-Experts runtime primitives.
moe_config: Mixture-of-Experts (MoE) configuration types.
multimodal: Legacy architecture wrappers (candle-based, non-decoder model types).
registry: Model registry and alias management
source: Model source resolution and downloading with progress tracking
tensor_wrapper: Candle Tensor wrapper implementing TensorLike
tokenizer: Tokenizer 占位实现
weight_format: Dim 3 polymorphism point — weight-format detection for the executor factory.

Structs§

AttentionConfig: Attention configuration for model architecture
ModelConfig: Model configuration for runtime
ModelInfo: Model information and metadata
RopeScaling: RoPE (Rotary Position Embedding) scaling configuration

Enums§

Activation: Activation function type
ModelType: Model type enumeration
NormType: Normalization type used in the model

Traits§

ModelExecutor: Core model executor trait focusing on tensor operations

Type Aliases§

Result: Result type used throughout Ferrum

Crate ferrum_models

Crate ferrum_models Copy item path

Re-exports§

Modules§

Structs§

Enums§

Traits§

Type Aliases§

Crate ferrum_models