1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
// SPDX-License-Identifier: MIT OR Apache-2.0
//! **ἀνάμνησις** — parse any tensor format, recover any precision.
//!
//! `anamnesis` is a framework-agnostic Rust library for dequantizing
//! quantized model weights and parsing tensor archives. It handles
//! `.safetensors` (read once, classify, dequantize to `BF16`),
//! `.npz` (bulk extraction at near-I/O speed), and `PyTorch` `.pth`
//! (zero-copy mmap with lossless safetensors conversion, 11–31× faster
//! than `torch.load()`).
//!
//! # Supported Quantization Schemes
//!
//! | Scheme | Feature gate | Speedup vs `PyTorch` CPU (AVX2) |
//! |--------|-------------|-------------------------------|
//! | `FP8` `E4M3` (fine-grained, per-channel, per-tensor) | *(always on)* | 2.7–9.7× |
//! | `GPTQ` (`INT4`/`INT8`, group-wise, `g_idx`) | `gptq` | 6.5–12.2× |
//! | `AWQ` (`INT4`, per-group, activation-aware) | `awq` | 4.7–5.7× |
//! | `BitsAndBytes` `NF4`/`FP4` (lookup + per-block absmax) | `bnb` | 18–54× |
//! | `BitsAndBytes` `INT8` (`LLM.int8()`, per-row absmax) | `bnb` | 1.2× |
//!
//! All schemes produce **bit-exact** output (0 ULP difference) against
//! `PyTorch` reference implementations, verified on real models.
//!
//! # `NPZ`/`NPY` Parsing
//!
//! Feature-gated behind `npz`. Custom `NPY` header parser with bulk
//! `read_exact` — zero per-element deserialization for LE data on LE
//! machines. Supports `F16`, `BF16`, `F32`, `F64`, all integer types,
//! and `Bool`. **3,586 MB/s** on a 302 MB file (1.3× raw I/O overhead).
//!
//! # `PyTorch` `.pth` Parsing
//!
//! Feature-gated behind `pth`. Minimal pickle VM (~36 opcodes) with
//! security allowlist. Memory-mapped I/O with zero-copy `Cow::Borrowed`
//! tensor data. Lossless `.pth` → `.safetensors` conversion.
//! **11–31× faster** than `torch.load()` on torchvision models.
//!
//! # Quick Start
//!
//! ```rust,no_run
//! use anamnesis::{parse, TargetDtype};
//!
//! let model = parse("model-fp8.safetensors")?;
//! let info = model.inspect();
//! println!("{info}");
//!
//! model.remember("model-bf16.safetensors", TargetDtype::BF16)?;
//! # Ok::<(), anamnesis::AnamnesisError>(())
//! ```
//!
//! # Architecture
//!
//! - [`parse()`] — read a `.safetensors` file into a [`ParsedModel`]
//! - [`ParsedModel::inspect`] — derive format, tensor counts, and size
//! estimates from the header (zero I/O)
//! - [`ParsedModel::remember`] — dequantize all quantized tensors to `BF16`
//! and write a standard `.safetensors` file
//! - `parse_npz()` — read an `.npz` archive into a `HashMap<String, NpzTensor>`
//! (requires `npz` feature)
//! - `parse_pth()` — parse a `PyTorch` `.pth` file into a `ParsedPth`
//! with zero-copy `tensors()` (requires `pth` feature)
//! - `pth_to_safetensors()` — lossless `.pth` → `.safetensors` conversion
//! (requires `pth` feature)
//!
//! The [`remember`] module contains one submodule per quantization family
//! ([`remember::fp8`], [`remember::gptq`], [`remember::awq`],
//! [`remember::bnb`]), each feature-gated independently.
// `deny` (not `forbid`) allows feature-gated modules to opt in to unsafe
// where required by external APIs (e.g., memmap2 in the `pth` module).
// See CONVENTIONS.md "// SAFETY:" rules for the policy.
// Allow unknown lint names so that `#[allow(clippy::newer_lint)]` in test
// modules does not become an error when built with MSRV clippy (which may
// not recognise lints added in later releases). Without this, every new
// clippy lint suppression is a potential MSRV CI break.
pub use ;
pub use ;
pub use ;
pub use ;
pub use ;
pub use ;
pub use ;
pub use dequantize_awq_to_bf16;
pub use dequantize_gptq_to_bf16;
pub use ;
pub use ;
pub use ;
pub use ;