Skip to main content

Crate ferrum_quantization

Crate ferrum_quantization 

Source
Expand description

Weight-format abstraction for Ferrum models.

Separates “what is the weight matrix like” (dense f32, GPTQ int4, AWQ, GGUF, …) from “what device does the math” (Backend) and “how does the model wire things together” (model code).

Usage in model code:

let qkv: Box<dyn Linear<B>> = loader.load_linear("model.layers.0.self_attn.qkv_proj")?;
qkv.forward(ctx, &input, &mut out, m);

The Linear trait dispatches to the appropriate backend kernel (B::gemm for Dense, B::gemm_gptq for GPTQ, etc.) without the model having to branch on quantization type.

Re-exports§

pub use dense::DenseLinear;
pub use factory::DefaultLinearFactory;
pub use gptq::GptqLinear;
pub use loader::PrefixedLoader;
pub use loader::WeightLoader;
pub use native_safetensors::NativeSafetensorsLoader;
pub use traits::LinearFactory;
pub use config::QuantConfig;
pub use config::QuantMethod;

Modules§

config
Quantization configuration parsed from model metadata.
dense
Dense linear projection — the baseline, uses B::gemm directly.
factory
DefaultLinearFactory — materialises dense f32 weights into DenseLinear<B>. Used by any WeightLoader implementation that wants delegate the “f32 slice → Linear” step without tying itself to a particular backend.
gptq
GPTQ linear projection.
loader
WeightLoader trait — unified interface for loading tensor/linear weights into a specific backend.
native_safetensors
Native safetensors WeightLoader<B> — mmap + safetensors crate, no candle dependency on the LLM hot path.
traits
Re-export of Linear trait (canonical home: ferrum-kernels) plus LinearFactory for weight-loader-side Linear construction.

Traits§

Linear
A weight-bearing linear projection.