mod-alloc 1.0.0

Allocation profiling for Rust. Counters, peak resident, and call-site grouping with inline backtrace capture. Zero external dependencies in the hot path. Lean dhat replacement targeting MSRV 1.75.
Documentation

What it does

mod-alloc is a #[global_allocator] wrapper that tracks every allocation and answers four questions for the code that runs while it is installed:

  • How many allocations did this code path make?
  • How many total bytes did it allocate?
  • What was the peak resident memory?
  • Which call sites did most of the allocating? (with the backtraces feature)

The allocation hot path is std-only. No backtrace, no libc, no gimli in the alloc path. Inline frame-pointer walking on x86_64 and aarch64 for call-site capture; raw mmap / VirtualAlloc for the per-thread arena and the global aggregation table.

The opt-in symbolicate feature pulls in pure-Rust addr2line + object + pdb + rustc-demangle for offline report generation; the alloc hot path stays untouched.

Quick start

use mod_alloc::{Profiler, ModAlloc};

#[global_allocator]
static GLOBAL: ModAlloc = ModAlloc::new();

fn main() {
    let p = Profiler::start();

    let v: Vec<u64> = (0..1_000).collect();
    drop(v);

    let stats = p.stop();
    println!("Allocations: {}", stats.alloc_count);
    println!("Total bytes: {}", stats.total_bytes);
    println!("Peak bytes:  {}", stats.peak_bytes);
}

Feature flags

[dependencies]
mod-alloc = "1"                                           # Tier 1: counters (default)
mod-alloc = { version = "1", features = ["backtraces"] }  # Tier 2: call-site capture
mod-alloc = { version = "1", features = ["symbolicate"] } # + function/file/line names
mod-alloc = { version = "1", features = ["dhat-compat"] } # DHAT JSON output + dhat-rs drop-in
Feature What it adds Since
counters Four lock-free counters via GlobalAlloc (default) 0.9.0
backtraces Inline FP walk + per-call-site aggregation (~11 ns overhead) 0.9.1
symbolicate Resolve raw addresses to (function, file, line) 0.9.2
dhat-compat DHAT JSON output + dhat_compat drop-in surface for dhat-rs 0.9.3

Backtraces

Enabling the backtraces feature requires frame pointers in the caller's build:

# .cargo/config.toml
[build]
rustflags = ["-C", "force-frame-pointers=yes"]

The crate's build.rs emits a cargo:warning= at compile time if RUSTFLAGS is missing this. Without it the walker degrades gracefully (returns shallow or empty traces) but does not crash.

The aggregation-table size is configurable at process start:

MOD_ALLOC_BUCKETS=16384 ./your-binary

Default is 4,096 buckets (~384 KB). Range [64, 1_048_576], rounded up to the next power of two.

Drop-in replacement for dhat-rs

With the dhat-compat feature, mod-alloc exposes a dhat_compat module that mirrors dhat-rs's public surface method-for-method. Migrate by changing a single import line:

use mod_alloc::dhat_compat as dhat;

#[global_allocator]
static ALLOC: dhat::Alloc = dhat::Alloc;

fn main() {
    let _profiler = dhat::Profiler::new_heap();
    // ... work ...
    // _profiler drops here → writes dhat-heap.json
}

The full mapping (and documented gaps) lives in MIGRATING_FROM_DHAT.md. The motivating benefit: mod-alloc holds MSRV 1.75 while dhat-rs forces 1.85+ through its backtrace → addr2line chain.

DHAT-compatible JSON output

With the dhat-compat feature enabled, the per-call-site report serialises to the JSON shape understood by the upstream dh_view.html viewer shipped with Valgrind:

GLOBAL.write_dhat_json("dhat-heap.json")?;

Combine with symbolicate for resolved function names plus file and line where available:

mod-alloc = { version = "0.9", features = ["symbolicate", "dhat-compat"] }

The JSON writer is hand-rolled — no serde / serde_json dependency. See examples/dhat_json.rs for a complete walk-through.

Performance

Measured per allocation, end to end, on a Windows x86_64 dev host with cargo run --release --example bench_overhead:

Build Per alloc + dealloc cycle
Tier 1 only (counters, default) 45.5 ns
Tier 1 + Tier 2 (backtraces, v0.9.5) 56.9 ns
Tier 1 + Tier 2 (backtraces, v0.9.4) ~2,050 ns

Both tiers comfortably clear the REPS section 6 targets (Tier 1 <50 ns, Tier 2 <200 ns of additional overhead). v0.9.5 dropped Tier 2 overhead from ~2,000 ns to ~11 ns by removing the per-thread arena layer and writing each captured event straight to the global aggregation table — table::record's steady-state matching path is two atomic operations, so the arena's batching no longer paid for itself.

Why a new allocation profiler

dhat is the de facto standard for allocation profiling in Rust, but its dependency chain (backtrace 0.3.76addr2line 0.25.1) forces consumers to MSRV 1.85+. For projects with a broader MSRV target, that cost is real.

mod-alloc provides the same core capability with inline backtrace capture (frame-pointer-based, x86_64 + aarch64) and no external dependencies. The trade-off is fewer architectures supported in 1.0; ARM32, RISC-V, and others land based on demand.

Status

Milestone Version State
Name-claim placeholder v0.1.0 shipped
Real GlobalAlloc + Tier 1 counters v0.9.0 shipped
Tier 2: inline backtrace capture v0.9.1 shipped
Symbolication for reports v0.9.2 shipped
Tier 3: DHAT-compatible JSON output v0.9.3 shipped
dhat-rs drop-in compat surface v0.9.4 shipped
dev-bench swap (consumer side) n/a shipped as dev-bench v0.9.7
Tier 2 perf optimisation v0.9.5 shipped
Stable API v1.0.0 shipped

v1.0.0 freezes the public API and the wire format. Breaking changes after 1.0 require a major version bump per Semantic Versioning.

Out of scope

  • Replacing the system allocator. Use mimalloc or jemallocator for that.
  • Use-after-free / double-free detection. Use AddressSanitizer.
  • Source-level instrumentation (build.rs, proc macros). The one build.rs in this crate exists solely to detect missing frame pointers at compile time.

Minimum supported Rust version

1.75, pinned in Cargo.toml and verified by CI on every push.

License

Apache-2.0. See LICENSE.