1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
//! `memory.h` peak / active / cache counter wrappers. Re-exported by
//! [`crate::memory`]; see the parent module's docs for the unified surface
//! description (counters + wired-limit guard + policies).
use crate;
/// The peak memory (in bytes) the process-global mlx allocator has ever
/// held since process start (or since the last [`reset_peak_memory`]).
///
/// Wraps `mlx_get_peak_memory`. mlx-lm's `mx.get_peak_memory()` (used in
/// `mlx_lm/generate.py` `stream_generate` to populate the
/// `GenerationResponse.peak_memory` field) is the same C++ entry point;
/// mlx-lm divides by `1e9` to report GB — the safe-Rust surface stays in
/// the raw byte count and lets the caller choose the scale.
/// Reset the peak-memory counter to the current active size.
///
/// Wraps `mlx_reset_peak_memory`. mlx-swift exposes the same hook as
/// `GPU.resetPeakMemory()`; mlx-lm uses it indirectly through
/// `mx.reset_peak_memory()` in its perf-eval scripts.
/// The mlx allocator's currently-resident bytes (excluding the recycled
/// cache).
///
/// Wraps `mlx_get_active_memory`. Mirrors mlx-swift's `GPU.activeMemory`
/// and mlx-lm's `mx.get_active_memory()` (debug / diagnostics use only;
/// `GenerationStats` itself reports the peak).
/// The mlx allocator's currently-recycled cache size in bytes.
///
/// Wraps `mlx_get_cache_memory`. Mirrors mlx-swift's `GPU.cacheMemory`
/// (`active + cache` is the allocator's total reservation).