Expand description
ἀνάμνησις — parse any tensor format, recover any precision.
anamnesis is a framework-agnostic Rust library for dequantizing
quantized model weights and parsing tensor archives. It handles
.safetensors (read once, classify, dequantize to BF16),
.npz (bulk extraction at near-I/O speed), and PyTorch .pth
(zero-copy mmap with lossless safetensors conversion, 11–31× faster
than torch.load()).
§Supported Quantization Schemes
| Scheme | Feature gate | Speedup vs PyTorch CPU (AVX2) |
|---|---|---|
FP8 E4M3 (fine-grained, per-channel, per-tensor) | (always on) | 2.7–9.7× |
GPTQ (INT4/INT8, group-wise, g_idx) | gptq | 6.5–12.2× |
AWQ (INT4, per-group, activation-aware) | awq | 4.7–5.7× |
BitsAndBytes NF4/FP4 (lookup + per-block absmax) | bnb | 18–54× |
BitsAndBytes INT8 (LLM.int8(), per-row absmax) | bnb | 1.2× |
All schemes produce bit-exact output (0 ULP difference) against
PyTorch reference implementations, verified on real models.
§NPZ/NPY Parsing
Feature-gated behind npz. Custom NPY header parser with bulk
read_exact — zero per-element deserialization for LE data on LE
machines. Supports F16, BF16, F32, F64, all integer types,
and Bool. 3,586 MB/s on a 302 MB file (1.3× raw I/O overhead).
§PyTorch .pth Parsing
Feature-gated behind pth. Minimal pickle VM (~36 opcodes) with
security allowlist. Memory-mapped I/O with zero-copy Cow::Borrowed
tensor data. Lossless .pth → .safetensors conversion.
11–31× faster than torch.load() on torchvision models.
§Quick Start
use anamnesis::{parse, TargetDtype};
let model = parse("model-fp8.safetensors")?;
let info = model.inspect();
println!("{info}");
model.remember("model-bf16.safetensors", TargetDtype::BF16)?;§Architecture
parse()— read a.safetensorsfile into aParsedModelParsedModel::inspect— derive format, tensor counts, and size estimates from the header (zero I/O)ParsedModel::remember— dequantize all quantized tensors toBF16and write a standard.safetensorsfileparse_npz()— read an.npzarchive into aHashMap<String, NpzTensor>(requiresnpzfeature)parse_pth()— parse aPyTorch.pthfile into aParsedPthwith zero-copytensors()(requirespthfeature)pth_to_safetensors()— lossless.pth→.safetensorsconversion (requirespthfeature)
The remember module contains one submodule per quantization family
(remember::fp8, [remember::gptq], [remember::awq],
[remember::bnb]), each feature-gated independently.
Re-exports§
pub use error::AnamnesisError;pub use error::Result;pub use inspect::format_bytes;pub use inspect::InspectInfo;pub use model::parse;pub use model::ParsedModel;pub use model::TargetDtype;pub use parse::AwqCompanions;pub use parse::AwqConfig;pub use parse::Bnb4Companions;pub use parse::BnbConfig;pub use parse::Dtype;pub use parse::GptqCompanions;pub use parse::GptqConfig;pub use parse::QuantScheme;pub use parse::SafetensorsHeader;pub use parse::TensorEntry;pub use parse::TensorRole;pub use remember::dequantize_fp8_to_bf16;pub use remember::dequantize_per_channel_fp8_to_bf16;pub use remember::dequantize_per_tensor_fp8_to_bf16;