Skip to main content

Crate anamnesis

Crate anamnesis 

Source
Expand description

ἀνάμνησις — parse any tensor format, recover any precision.

anamnesis is a framework-agnostic Rust library for dequantizing quantized model weights and parsing tensor archives. It handles .safetensors (read once, classify, dequantize to BF16), .npz (bulk extraction at near-I/O speed), and PyTorch .pth (zero-copy mmap with lossless safetensors conversion, 11–31× faster than torch.load()).

§Supported Quantization Schemes

SchemeFeature gateSpeedup vs PyTorch CPU (AVX2)
FP8 E4M3 (fine-grained, per-channel, per-tensor)(always on)2.7–9.7×
GPTQ (INT4/INT8, group-wise, g_idx)gptq6.5–12.2×
AWQ (INT4, per-group, activation-aware)awq4.7–5.7×
BitsAndBytes NF4/FP4 (lookup + per-block absmax)bnb18–54×
BitsAndBytes INT8 (LLM.int8(), per-row absmax)bnb1.2×

All schemes produce bit-exact output (0 ULP difference) against PyTorch reference implementations, verified on real models.

§NPZ/NPY Parsing

Feature-gated behind npz. Custom NPY header parser with bulk read_exact — zero per-element deserialization for LE data on LE machines. Supports F16, BF16, F32, F64, all integer types, and Bool. 3,586 MB/s on a 302 MB file (1.3× raw I/O overhead).

§PyTorch .pth Parsing

Feature-gated behind pth. Minimal pickle VM (~36 opcodes) with security allowlist. Memory-mapped I/O with zero-copy Cow::Borrowed tensor data. Lossless .pth.safetensors conversion. 11–31× faster than torch.load() on torchvision models.

§Quick Start

use anamnesis::{parse, TargetDtype};

let model = parse("model-fp8.safetensors")?;
let info = model.inspect();
println!("{info}");

model.remember("model-bf16.safetensors", TargetDtype::BF16)?;

§Architecture

  • parse() — read a .safetensors file into a ParsedModel
  • ParsedModel::inspect — derive format, tensor counts, and size estimates from the header (zero I/O)
  • ParsedModel::remember — dequantize all quantized tensors to BF16 and write a standard .safetensors file
  • parse_npz() — read an .npz archive into a HashMap<String, NpzTensor> (requires npz feature)
  • parse_pth() — parse a PyTorch .pth file into a ParsedPth with zero-copy tensors() (requires pth feature)
  • pth_to_safetensors() — lossless .pth.safetensors conversion (requires pth feature)

The remember module contains one submodule per quantization family (remember::fp8, [remember::gptq], [remember::awq], [remember::bnb]), each feature-gated independently.

Re-exports§

pub use error::AnamnesisError;
pub use error::Result;
pub use inspect::format_bytes;
pub use inspect::InspectInfo;
pub use model::parse;
pub use model::ParsedModel;
pub use model::TargetDtype;
pub use parse::AwqCompanions;
pub use parse::AwqConfig;
pub use parse::Bnb4Companions;
pub use parse::BnbConfig;
pub use parse::Dtype;
pub use parse::GptqCompanions;
pub use parse::GptqConfig;
pub use parse::QuantScheme;
pub use parse::SafetensorsHeader;
pub use parse::TensorEntry;
pub use parse::TensorRole;
pub use remember::dequantize_fp8_to_bf16;
pub use remember::dequantize_per_channel_fp8_to_bf16;
pub use remember::dequantize_per_tensor_fp8_to_bf16;

Modules§

error
inspect
model
High-level parse-first API.
parse
remember
Precision recovery (dequantization) — built on parse.