mod-alloc 0.9.4

Allocation profiling for Rust. Counters, peak resident, and call-site grouping with inline backtrace capture. Zero external dependencies in the hot path. Lean dhat replacement targeting MSRV 1.75.
Documentation
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
# Changelog

All notable changes to this project will be documented in this file.

The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.1.0/),
and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0.html).

## [Unreleased]

## [0.9.4] - 2026-05-18

### Added

- **`dhat_compat` module — drop-in replacement for `dhat-rs`.**
  Behind the existing `dhat-compat` cargo feature. Mirrors
  `dhat-rs`'s public surface method-for-method so consumers
  (notably `dev-bench`) can swap dhat for mod-alloc with a
  one-line import change (`use mod_alloc::dhat_compat as dhat;`).
- **`dhat_compat::Alloc`** — unit-struct global allocator
  matching `dhat::Alloc`'s usage pattern
  (`static A: Alloc = Alloc;`). Forwards every `GlobalAlloc`
  call to a process-wide static `ModAlloc` so `HeapStats::get()`
  and `Profiler` see the live counters.
- **`dhat_compat::Profiler` + `ProfilerBuilder`** — RAII handle
  that writes a DHAT JSON report on drop. `new_heap()`,
  `new_ad_hoc()`, `builder()`. Builder methods: `ad_hoc()`,
  `testing()`, `file_name()`, `trim_backtraces()`, `build()`.
- **`dhat_compat::HeapStats`** — six-field stats mirroring
  `dhat::HeapStats` exactly (including the
  `u64 total_*` / `usize curr_* max_*` asymmetry).
  `HeapStats::get()` snapshots the installed allocator.
- **`dhat_compat::AdHocStats` + `ad_hoc_event(weight)`** — ad-hoc
  mode counters with the same shape as `dhat`'s. Two atomic ops
  per event; no allocation.
- **`live_count` and `peak_live_count` fields on `AllocStats`.**
  Wired into `record_alloc` / `record_dealloc` to track
  currently-alive block counts. Backs `HeapStats::curr_blocks`
  and `max_blocks`. `record_realloc` does not touch them
  (a realloc is the same block from a count perspective —
  matches dhat's accounting).
- **Ad-hoc JSON writer** in `src/dhat_compat/ad_hoc_writer.rs`.
  Emits `dhatFileVersion: 2`, `mode: "ad-hoc"` files alongside
  the existing heap-mode writer.
- **`MIGRATING_FROM_DHAT.md`** project-root guide covering the
  one-line swap, API surface mapping, behavioural differences,
  and rollback steps.
- New tests:
  - `tests/dhat_compat_surface.rs` — 7 tests covering the swap
    pattern, live-block tracking, drop-time file write,
    testing-mode skip, ad-hoc event accumulation,
    `trim_backtraces` over-cap clamp, and `Profiler::new_heap`
    construction.
  - `src/dhat_compat/{mod,profiler,stats,ad_hoc_writer}.rs`
    unit tests cover unit-struct size, default construction,
    builder configuration, ad-hoc counter math, and JSON
    rendering.
  - In-file `src/lib.rs` tests gained
    `live_counters_track_alive_blocks` and
    `record_realloc_does_not_touch_live_count` to lock the
    new counter semantics.
- **`examples/dhat_drop_in.rs`** showing the one-line dhat-rs
  swap pattern.

### Changed

- **`AllocStats` gained two fields** (`live_count`,
  `peak_live_count`). This is a 0.x-window minor-breaking
  change for callers constructing `AllocStats` via struct
  literal — they must initialise the new fields. Callers using
  `ModAlloc::snapshot()` or `Profiler::stop()` are unaffected.
  Existing in-tree tests and the smoke test were updated to
  pass the new fields.
- **`record_alloc` performs two extra atomic ops**
  (`live_count.fetch_add`, `peak_live_count.fetch_max`).
  `record_dealloc` performs one extra (`live_count.fetch_sub`).
  Both use `Relaxed` ordering matching the existing counters.
  Measured added cost on the alloc hot path is under 3 ns —
  the new atomics dual-issue alongside the existing
  `current_bytes` updates.
- Module-level rustdoc in `src/lib.rs` updated for v0.9.4.

### Documented gaps from dhat-rs

- Backtrace depth capped at 8 frames (Tier 2 walker limit);
  `trim_backtraces` is accepted for API parity but silently
  clamps.
- Drop-time JSON write errors are swallowed silently — matches
  dhat-rs's behaviour.
- Double-Profiler construction is a no-op instead of a panic
  (dhat-rs panics); documented "last writer wins" on the JSON
  file.
- `dhat::assert!` / `dhat::assert_eq!` / `dhat::assert_ne!`
  macros are not yet ported. Use `HeapStats::get()` directly
  in test assertions until they ship.

### Migration note

This release is the unblock for projects whose MSRV target
forces them off `dhat = "0.3"` (which today requires
Rust 1.85+ through its `backtrace → addr2line` chain).
`mod-alloc` holds MSRV 1.75 and provides the same core
profiling surface.

## [0.9.3] - 2026-05-18

### Added

- **`dhat-compat` cargo feature wired up.** Previously a no-op
  placeholder shipped since v0.9.0, the feature now emits the
  per-call-site report as DHAT-compatible JSON
  (`dhatFileVersion: 2`, `mode: "rust-heap"`) that the upstream
  `dh_view.html` viewer shipped with Valgrind loads directly. No
  new external dependencies; the JSON writer is hand-rolled.
- **`ModAlloc::dhat_json_string() -> String`** renders the report
  as a JSON string. Allocates; call from outside the allocator
  hook.
- **`ModAlloc::write_dhat_json(path)`** writes the rendered JSON
  to `path` (mirrors `dhat-rs`'s `dhat-heap.json` convention).
  Returns `std::io::Result<()>`.
- **Frame-string formatting cfg-splits on `symbolicate`.** Without
  the `symbolicate` feature, frame strings carry raw hex
  addresses with `<unresolved>` placeholders. With `symbolicate`,
  the JSON carries function names plus (where the platform
  supports it) source file and line, with `[inlined]` flags on
  inlined expansions.
- **Frame-table deduplication.** The internal builder keeps a
  `HashMap<String, u32>` so identical frame strings reused across
  call sites share a single `ftbl` entry. Index 0 is reserved for
  the literal `"[root]"`.
- New tests:
  - `tests/dhat_json_shape.rs`: validates the top-level keys are
    present, the document starts/ends with object braces, a
    workload produces at least one program point, and
    `write_dhat_json` round-trips byte-for-byte with
    `dhat_json_string`.
  - `src/dhat_json/writer.rs` unit tests cover JSON string
    escaping (quote, backslash, newline, low control bytes,
    multibyte UTF-8 pass-through).
  - `src/dhat_json/frames.rs` unit tests cover frame-table
    interning and the raw frame-string format.
- **`examples/dhat_json.rs`** drops a `dhat-heap.json` file in
  the current working directory for inspection in the upstream
  viewer.

### Changed

- Module-level rustdoc in `src/lib.rs` updated to mention Tier 3
  DHAT JSON output.

### Notes

- The viewer hides time-and-lifetime columns (`tl`, `mb`, `mbk`,
  etc.) automatically because `mod-alloc` emits `bklt: false`. We
  do not track per-allocation lifetimes today and will not
  fabricate values just to populate columns.
- `eb` and `ebk` (at-end bytes / blocks) are emitted as `0`. The
  JSON can be written at any point during execution, so there is
  no meaningful "end" snapshot from our perspective.

## [0.9.2] - 2026-05-14

### Added

- **`symbolicate` cargo feature.** Turns the raw return addresses
  captured by v0.9.1's backtrace path into
  `(function, file, line)` tuples at report-generation time.
  Available on Linux, macOS, *BSD via `addr2line` + `object`;
  Windows via `pdb`. All four resolver crates plus
  `rustc-demangle` are pulled in only when the feature is on; the
  default build remains zero-runtime-dep.
- **`ModAlloc::symbolicated_report() -> Vec<SymbolicatedCallSite>`**.
  Drains the per-call-site table and resolves each frame against
  the running binary's debug info. Cached per-address across
  calls; allocates, so safe to call only from outside the
  allocator hook.
- **`SymbolicatedCallSite`** and **`SymbolicatedFrame`** public
  types behind `#[cfg(feature = "symbolicate")]`. `frames` is a
  `Vec` in stack-frame order with `inlined: bool` marking
  expansions from a single physical return address.
- **Self-binary discovery** via `std::env::current_exe`, cached
  in a `OnceLock<Option<PathBuf>>`. Falls back to unresolved
  frames if the binary path cannot be determined.
- **Per-process address cache** in `symbolicate::report` keyed by
  raw `u64` address. Cache miss runs the platform symbolicator
  once; subsequent calls reuse the cached `Vec<SymbolicatedFrame>`.
- **Approved external-dep exception** in `.dev/DIRECTIVES.md`
  section 2.2 documenting the `symbolicate` feature's deps
  (`addr2line`, `object`, `rustc-demangle`, `pdb`, plus a
  pinned `uuid = "=1.10.0"` to hold MSRV 1.75 against `pdb`'s
  latest transitive `uuid` 1.x).
- New tests:
  - `tests/symbolicate_self.rs`: self-symbolication of a known
    in-test-binary function; downgrades to shape-only when
    debug info is unavailable.
  - `tests/symbolicate_concurrent.rs`: 8 threads calling
    `symbolicated_report()` simultaneously, asserts no deadlock
    and consistent row count.
  - `src/symbolicate/self_binary.rs` unit tests for path
    resolution and caching.
- **`examples/symbolicate.rs`** prints top-10 call sites sorted
  by total bytes with resolved function names plus inlined
  frames where available.
- CI matrix gains explicit `backtraces` + `symbolicate` feature
  steps with the FP flag set.

### Changed

- Module-level rustdoc in `src/lib.rs` updated to mention Tier 2
  symbolication.
- The `symbolicate` feature implies `backtraces` (which in turn
  implies `std`). Activating it alone is sufficient.
- Linux/macOS produce richer output than Windows: DWARF inlining
  info is more complete than PDB's `S_INLINESITE` decoding (the
  latter is deferred for v0.9.3+). Asymmetry is documented and
  not gated.

### Limitations (Windows path)

- No source file / line on Windows yet. PDB exposes line info
  via `module.line_program()` but threading it through the
  index is non-trivial; deferred to a later release.
- No inlined-frame expansion on Windows. PDB `S_INLINESITE`
  records are not yet decoded.
- Best-effort address-to-RVA translation: without the module's
  load base we mask `address` to 32 bits and binary-search the
  RVA index. Exact for non-ASLR builds; usable but approximate
  for ASLR builds.
- C++ frames remain mangled (only Rust mangling is decoded via
  `rustc-demangle`). Adding `cpp_demangle` is a follow-up if
  there's real demand.

## [0.9.1] - 2026-05-14

### Added

- **Tier 2: inline backtrace capture (`backtraces` feature).** Each
  tracked allocation, zero-init allocation, and reallocation
  captures up to 8 frames of its call site via inline
  frame-pointer walking. Available on `x86_64` and `aarch64`.
  Other architectures compile but capture is a no-op.
- **`ModAlloc::call_sites()`** drains the per-call-site
  aggregation table into a `Vec<CallSiteStats>`. Each row carries
  raw return addresses (top of stack first), the number of
  allocations attributed to the site, and the total bytes.
  Symbolication ships in v0.9.2.
- **`CallSiteStats`** public type behind the `backtraces` feature.
- **Per-thread arena** (64 KB OS-page region per thread, 512
  events per flush) and **global aggregation table** (4,096
  buckets by default, ~384 KB) allocated through raw
  `mmap` / `VirtualAlloc` so the backtrace path never recurses
  into `ModAlloc::alloc` for its own state.
- **`MOD_ALLOC_BUCKETS` env var** to override the
  aggregation-table size at process start. Value is rounded up to
  the next power of two and clamped to `[64, 1_048_576]`.
- **`build.rs`** (one-off approved exception): warns at compile
  time when `RUSTFLAGS` is missing `-C force-frame-pointers=yes`
  and the `backtraces` feature is on. See
  `.dev/DIRECTIVES.md` section 2.1 for the documented exception.
- **`.cargo/config.toml`** in the crate root enables frame
  pointers for the crate's own builds so the test suite and
  examples produce useful traces. Downstream consumers must
  enable the flag in their own builds.
- New tests:
  - `tests/backtrace_real_chain.rs`: captures from a deeply
    nested `#[inline(never)]` call chain.
  - `tests/backtrace_fuzz.rs`: SplitMix64-driven random workload
    proving the walker is total under varied allocation patterns
    (10,000 iterations).
  - `tests/backtrace_concurrent.rs`: 32-thread aggregation
    stress test.
  - `src/backtrace/*` unit tests cover hash determinism, walker
    safety checks (null, alignment, out-of-range, non-monotonic,
    max-frame cap), arena round-trip, table claim races, and
    stack-bounds discovery.
- **`examples/backtraces.rs`** demonstrates installing
  `ModAlloc`, exercising a few distinct call paths, and printing
  the top sites by total bytes.
- **CI: AddressSanitizer nightly job.** A dedicated job in
  `.github/workflows/ci.yml` runs the test suite under
  `-Zsanitizer=address` on Linux x86_64 to catch any UB in the
  unsafe FP-walker path that survives the in-walker safety
  checks.

### Changed

- `GlobalAlloc::alloc`, `alloc_zeroed`, and `realloc` invoke
  `backtrace::record_event` after the existing counter update
  when the `backtraces` feature is on. `dealloc` does not capture
  (matches dhat: call sites describe who allocated, not who
  freed).
- Per maintainer guidance, realloc captures all events including
  shrinks, matching dhat's per-event accounting. Documented in
  the rustdoc.
- CI workflow runs the `backtraces` test suite with
  `RUSTFLAGS="-C force-frame-pointers=yes"` so traces are
  meaningful on hosted runners.
- **Walker reads are no longer `read_volatile`.** The walker reads
  the current thread's own stack memory after the four
  bounds/alignment/range/monotonicity checks; no other thread can
  mutate that memory mid-walk. Plain reads let the compiler
  schedule the loads and unblock register-allocation
  opportunities. Same observable behaviour, with a small per-walk
  speed-up.
- **Table matching path no longer spins on `wait_published`.** The
  init thread now uses `fetch_add` (instead of `store`) for the
  bucket's `count` and `total_bytes` fields. Concurrent matching
  writers can land their increments at any moment without being
  clobbered, so the matching path becomes two `fetch_add` calls
  with no spin loop. Readers (`call_sites_report`) still gate on
  `frame_count > 0` for sample-frame coherence.
- CI: ASAN job sets `ASAN_OPTIONS=detect_stack_use_after_return=0`
  so the stack-bounds test runs against the real stack rather
  than ASAN's fake-stack heap allocation. The walker is
  unaffected either way; the test just needed a real-stack
  context to assert against.
- CI: added a defensive "Verify toolchain is fully installed"
  step after `dtolnay/rust-toolchain@stable` to heal the
  occasional macOS runner-image case where `cargo` resolves to
  `rustup-init`.

### Design notes

- **Reentrancy on the backtrace path.** The walker reads memory
  inside the cached stack bounds (which are queried via
  `GetCurrentThreadStackLimits` on Windows,
  `pthread_getattr_np` on Linux, `pthread_get_stackaddr_np` on
  Darwin / BSD). All reads are pointer-aligned and in-range; no
  page faults are possible. The existing `IN_ALLOC` reentrancy
  guard from v0.9.0 catches any pathological allocation
  triggered transitively from inside the backtrace path (e.g.
  libc lazy-init during the first `pthread_getattr_np`).
- **No `HashMap` in the hot path.** The global aggregation table
  is a fixed-size open-addressed array allocated once via raw OS
  pages, with atomic per-bucket CAS for claim and linear probing
  for index collisions. Hash collisions on the 64-bit FxHash are
  a documented limitation (different sites with identical hashes
  get conflated).
- **Bucket publish protocol.** Each bucket uses a two-phase
  claim: CAS on `hash` first (Release), then write
  `sample_frames`, then store `frame_count` with Release. Readers
  gate on `frame_count > 0` after observing a non-zero hash;
  this prevents torn reads of the sample frames.

### Migration

The default build (Tier 1 only) is unchanged. Existing callers
need no edits.

Users opting in to the `backtraces` feature must add
`-C force-frame-pointers=yes` to their build configuration. The
included `build.rs` emits a `cargo:warning=` at compile time if
this is missing.

## [0.9.0] - 2026-05-13

### Added

- `unsafe impl GlobalAlloc for ModAlloc`. Installing `ModAlloc` as
  `#[global_allocator]` now records every alloc, dealloc, realloc,
  and `alloc_zeroed` event into the four Tier 1 counters
  (`alloc_count`, `total_bytes`, `peak_bytes`, `current_bytes`).
- Lock-free counter updates on the hot path using `AtomicU64` with
  `Relaxed` ordering, plus `fetch_max` for the peak high-water
  mark.
- Thread-local reentrancy guard. The allocator hook is recursion
  safe: if any code transitively triggered from inside the tracking
  path attempts to allocate, the nested call bypasses tracking and
  forwards directly to the System allocator. The flag is
  `const`-initialised so TLS access on the hot path does not
  allocate.
- Lazy `Profiler` registration via a process-wide `AtomicPtr`
  handle that the `GlobalAlloc` impl populates on first call.
  `Profiler::start()` and `Profiler::stop()` snapshot the installed
  allocator without requiring an explicit registration step.
- New integration tests under `tests/`:
  - `counters_accuracy.rs`: single-thread counter correctness.
  - `concurrent_alloc.rs`: 64-thread x 5,000-allocation stress test.
  - `profiler_delta.rs`: Profiler delta math.
  - `reentrancy.rs`: reentrancy-guard smoke test.
- `examples/bench_overhead.rs`: per-allocation overhead
  micro-benchmark.

### Changed

- `ModAlloc::snapshot` now returns the running counter values from
  the live `GlobalAlloc` path. In v0.1.0 it always returned zeros.
- `ModAlloc::reset` zeroes the counters. Documented caveat:
  resetting while allocations are outstanding can cause
  `current_bytes` to wrap on subsequent deallocations; reset before
  any workload begins for clean accounting.
- `Profiler::stop` returns deltas for `alloc_count`, `total_bytes`,
  and `current_bytes`. `peak_bytes` is the absolute high-water mark
  observed during the profiling window (peak-delta has no
  meaningful semantic). The rustdoc on `Profiler::stop` documents
  this difference explicitly.
- `examples/basic.rs` now installs `ModAlloc` as
  `#[global_allocator]` and prints real counter values.
- Module-level rustdoc in `src/lib.rs` updated to describe counter
  semantics, the installation pattern, and the v0.9.0 status.

### Design notes

- **Per-thread arena deferred to v0.9.1.** The original ROADMAP
  entry for v0.9.0 envisaged a 64KB per-thread arena with periodic
  global aggregation. v0.9.0 ships with direct atomic increments
  on four shared counters instead. Per-thread buffering becomes
  load-bearing in v0.9.1 when backtrace state (32-64 bytes per
  allocation) would otherwise serialise on the global path; for
  four `u64` counters the indirection is not warranted yet. See
  `.dev/DESIGN.md` section 2 for the full rationale.
- **`backtraces` and `dhat-compat` features are no-ops in v0.9.0.**
  They remain defined in `Cargo.toml` so build matrices stay green
  and downstream callers can opt in once the features ship. Real
  implementations land in v0.9.1 (Tier 2: inline backtrace
  capture) and v0.9.3 (Tier 3: DHAT-compatible JSON output).

### Migration

The public API surface is unchanged. Callers using the v0.1.0
placeholder API (`ModAlloc::new`, `snapshot`, `reset`, `Profiler`,
`AllocStats`) continue to compile and behave identically when
`ModAlloc` is not installed as the global allocator. Callers that
install it now see real counter values where v0.1.0 returned zeros.

## [0.1.0] - 2026-05-11

### Added

- Initial crate skeleton.
- `ModAlloc` struct (the global allocator wrapper) with `new`,
  `snapshot`, `reset` methods. Placeholder implementation forwards
  to System allocator without tracking.
- `AllocStats` struct: alloc_count, total_bytes, peak_bytes,
  current_bytes.
- `Profiler` for scoped delta capture: `start` / `stop`.
- Feature flags: `std` (default), `counters` (default), `backtraces`,
  `dhat-compat`.
- Smoke tests.

### Note

This is the name-claim release. The `GlobalAlloc` trait is not yet
implemented; using `ModAlloc` as `#[global_allocator]` in 0.1.0
will not work. Real implementation lands in `0.9.x` along with:

- Per-thread arena-based tracking to avoid contention.
- Inline frame-pointer-based backtrace capture (x86_64 + aarch64).
- DHAT-compatible JSON output.
- Statistical validation suite.

[Unreleased]: https://github.com/jamesgober/mod-alloc/compare/v0.9.4...HEAD
[0.9.4]: https://github.com/jamesgober/mod-alloc/compare/v0.9.3...v0.9.4
[0.9.3]: https://github.com/jamesgober/mod-alloc/compare/v0.9.2...v0.9.3
[0.9.2]: https://github.com/jamesgober/mod-alloc/compare/v0.9.1...v0.9.2
[0.9.1]: https://github.com/jamesgober/mod-alloc/compare/v0.9.0...v0.9.1
[0.9.0]: https://github.com/jamesgober/mod-alloc/compare/v0.1.0...v0.9.0
[0.1.0]: https://github.com/jamesgober/mod-alloc/releases/tag/v0.1.0