1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
"""snapshot — Pure-Rust engine persistence for OxiLLaMa.
This module provides a ``pickle``-shaped surface for saving and restoring
:class:`~oxillama_py.Engine` state. Unlike ``pickle``, the format is
stable, Pure-Rust, and does not serialize model weights (it stores only KV
cache, sampler config, and a Blake3 fingerprint of the GGUF file).
File format
-----------
``OXISNAP1`` magic + version ``u32`` + oxicode-encoded payload, identical to
the format used by ``oxillama-runtime`` internally.
Known limitations
-----------------
* Only transformer (attention) architectures round-trip — Mamba-2 / Jamba
KV state is tracked internally but not yet emitted by ``snapshot()``.
* Grammar state resets to initial after restore (grammar *source* is
preserved, but parser state is not).
* Sampler RNG reflects the config seed, not in-flight generation state.
* Offload policy is reset to ``None`` on restore.
* ``from_hub``-loaded engines snapshot the local HF cache path; restoring on
a different machine requires the same GGUF to be present at the stored
path, or an explicit ``model_path=`` override. Hub-aware snapshots are
tracked as R3 in ``crates/oxillama-py/TODO.md``.
* Token history (``tokens_count``) is always 0 in v0.1.3 because token
history is not tracked at engine level.
* ``AsyncEngine.restore`` is not available in v0.1.3 — restore a snapshot
synchronously via ``Engine.restore(path)`` and wrap the result in
``AsyncEngine`` if async access is needed.
"""
# OxiLlamaError is None when the native extension has not been compiled yet.
# Fall back to Exception so that SnapshotError is always importable and
# the pure-Python layer remains usable without a native build.
# type: ignore[import-untyped]
: =
=
=
# type: ignore[misc]
"""Raised when a snapshot operation fails (convenience alias).
The underlying Rust methods raise either :exc:`~oxillama_py.LoadError`
(model fingerprint mismatch) or :exc:`~oxillama_py.GenerateError`
(malformed / incompatible snapshot). Catch ``SnapshotError`` to handle
both at once::
try:
engine = oxillama_py.snapshot.load("snap.oxsn")
except oxillama_py.snapshot.SnapshotError as exc:
print(f"restore failed: {exc}")
"""
"""Write *engine* state to *path* atomically.
Equivalent to ``engine.snapshot(path)``. Raises :exc:`SnapshotError`
(or a subclass) if the engine is not loaded or the write fails.
"""
"""Return *engine* state as a :class:`bytes` object.
Equivalent to ``engine.snapshot_bytes()``. Suitable for in-memory
transport (e.g. multiprocessing queues, network sockets).
"""
return
"""Reconstruct an :class:`~oxillama_py.Engine` from *path*.
If *model_path* is ``None`` (default), the model path embedded in the
snapshot is used. Pass an explicit *model_path* to override (useful
when moving a snapshot between machines where the GGUF lives at a
different absolute path).
The GGUF model is re-loaded from disk on every restore.
Raises :exc:`SnapshotError` (specifically :exc:`~oxillama_py.LoadError`
for fingerprint mismatches and :exc:`~oxillama_py.GenerateError` for
corrupted / incompatible snapshots).
"""
return
"""Reconstruct an :class:`~oxillama_py.Engine` from an in-memory *blob*.
*model_path* is required — the embedded path in the snapshot is used as
the loading target, but *model_path* must be provided explicitly when
working with an in-memory blob (since there is no file to re-read from).
Writes the blob to a temporary file and calls :func:`load`. This is the
mirror of :func:`dumps`.
"""
=
return
pass
"""Return metadata from the snapshot at *path* without loading the model.
The returned :class:`~oxillama_py.SnapshotInfo` exposes ``arch_id``,
``model_path``, ``tokenizer_path``, ``max_context_length``,
``num_threads``, ``version``, ``magic``, and ``tokens_count``.
"""
return