# Multitude Performance Report

Generated by `scripts/perf_report.rs`:
- `cargo bench --bench criterion_alloc` and `criterion_drop` — criterion wall-clock timings.
- `cargo bench --bench gungraun_alloc` and `gungraun_drop` — Callgrind instruction-precise counts.

**Workload:** N = 1000 operations per measurement; slice element count = 8.  
Criterion median is reported (default 30 samples, 1 s warm-up, 2 s measurement; override with `--samples` / `--measurement-time` / `--warm-up-time`).  
Memory accesses = L1 Hits + LL Hits + RAM Hits (Callgrind D-cache references).  
Bench names are aligned between criterion and gungraun via the `GROUPS` table in `scripts/perf_report.rs`.

## `arena_creation`

| Variant | Time (criterion) | Instructions | Branch misses | Mem accesses |
|---|---:|---:|---:|---:|
| `multitude` | 37 ns | 335 | 9 | 495 |
| `bumpalo_new` | 1 ns | 16 | 1 | 26 |

## `alloc_u64`

| Variant | Time (criterion) | Instructions | Branch misses | Mem accesses |
|---|---:|---:|---:|---:|
| `alloc` | 4.02 µs | 16,032 | 7 | 25,055 |
| `alloc_with` | 4.07 µs | 16,031 | 11 | 25,052 |
| `alloc_box` | 8.62 µs | 25,047 | 6 | 41,086 |
| `alloc_box_with` | 8.60 µs | 26,047 | 6 | 42,086 |
| `alloc_uninit_box` | 6.93 µs | 22,047 | 6 | 35,086 |
| `alloc_zeroed_box` | 9.16 µs | 23,047 | 6 | 37,086 |
| `alloc_arc` | 8.49 µs | 25,047 | 8 | 41,086 |
| `alloc_arc_with` | 8.67 µs | 26,047 | 6 | 42,086 |
| `alloc_uninit_arc` | 6.82 µs | 22,047 | 6 | 35,086 |
| `alloc_zeroed_arc` | 9.10 µs | 23,047 | 6 | 37,086 |
| `bumpalo_alloc` | 4.02 µs | 21,025 | 5 | 31,042 |
| `bumpalo_alloc_with` | 4.00 µs | 21,023 | 3 | 31,038 |

## `alloc_str`

| Variant | Time (criterion) | Instructions | Branch misses | Mem accesses |
|---|---:|---:|---:|---:|
| `alloc_str` | 7.20 µs | 52,056 | 15 | 78,103 |
| `alloc_str_box` | 15.24 µs | 62,057 | 13 | 90,106 |
| `alloc_str_arc` | 15.33 µs | 62,057 | 13 | 90,106 |
| `bumpalo_alloc_str` | 6.98 µs | 54,048 | 13 | 81,088 |

## `alloc_slice`

| Variant | Time (criterion) | Instructions | Branch misses | Mem accesses |
|---|---:|---:|---:|---:|
| `alloc_slice_copy` | 41.44 µs | 20,057 | 10 | 32,102 |
| `alloc_slice_clone` | 42.13 µs | 54,054 | 12 | 69,100 |
| `alloc_slice_fill_with` | 44.35 µs | 36,026 | 9 | 65,043 |
| `alloc_slice_fill_iter` | 44.34 µs | 38,029 | 11 | 68,048 |
| `alloc_slice_copy_box` | 51.55 µs | 33,624 | 29 | 55,883 |
| `alloc_slice_clone_box` | 52.56 µs | 78,624 | 36 | 104,883 |
| `alloc_slice_fill_with_box` | 52.95 µs | 51,564 | 33 | 91,778 |
| `alloc_slice_fill_iter_box` | 53.24 µs | 53,564 | 31 | 95,778 |
| `alloc_uninit_slice_box` | 50.50 µs | 25,564 | 31 | 40,778 |
| `alloc_zeroed_slice_box` | 50.73 µs | 30,564 | 32 | 48,778 |
| `alloc_slice_copy_arc` | 51.13 µs | 32,624 | 30 | 53,883 |
| `alloc_slice_clone_arc` | 51.97 µs | 67,624 | 32 | 90,883 |
| `alloc_slice_fill_with_arc` | 53.57 µs | 49,564 | 31 | 87,778 |
| `alloc_slice_fill_iter_arc` | 53.28 µs | 50,564 | 30 | 89,778 |
| `alloc_uninit_slice_arc` | 51.79 µs | 24,564 | 31 | 38,778 |
| `alloc_zeroed_slice_arc` | 50.73 µs | 28,564 | 30 | 44,778 |
| `bumpalo_alloc_slice_copy` | 41.49 µs | 42,042 | 4 | 61,076 |
| `bumpalo_alloc_slice_clone` | 41.31 µs | 63,046 | 9 | 79,083 |
| `bumpalo_alloc_slice_fill_with` | 43.33 µs | 42,023 | 5 | 74,038 |
| `bumpalo_alloc_slice_fill_iter` | 43.38 µs | 42,023 | 5 | 74,038 |

## `string_builder`

| Variant | Time (criterion) | Instructions | Branch misses | Mem accesses |
|---|---:|---:|---:|---:|
| `alloc_string` | 9.58 µs | 42,499 | 45 | 59,285 |
| `alloc_string_with_capacity` | 9.68 µs | 41,915 | 26 | 58,487 |
| `bumpalo_string_new_in` | 9.28 µs | 35,843 | 76 | 50,867 |
| `bumpalo_string_with_capacity_in` | 8.90 µs | 34,708 | 27 | 49,159 |

## `vec_builder`

| Variant | Time (criterion) | Instructions | Branch misses | Mem accesses |
|---|---:|---:|---:|---:|
| `alloc_vec` | 1.23 µs | 12,209 | 45 | 17,775 |
| `alloc_vec_with_capacity` | 1.22 µs | 12,577 | 26 | 18,939 |
| `bumpalo_vec_new_in` | 3.98 µs | 12,281 | 59 | 18,888 |
| `bumpalo_vec_with_capacity_in` | 1.03 µs | 11,069 | 7 | 17,116 |

## `drop`

| Variant | Time (criterion) | Instructions | Branch misses | Mem accesses |
|---|---:|---:|---:|---:|
| `box_u64` | 7.76 µs | 10,746 | 70 | 14,599 |
| `rc_u64` | 7.78 µs | 10,746 | 70 | 14,599 |
| `arc_u64` | 7.75 µs | 10,746 | 70 | 14,599 |
| `box_droppy` | 15.01 µs | 186,587 | 86 | 273,293 |
| `rc_droppy` | 27.14 µs | 218,987 | 92 | 320,869 |
| `arc_droppy` | 27.05 µs | 218,987 | 92 | 320,869 |
| `str_box` | 7.45 µs | 10,746 | 71 | 14,599 |
| `str_rc` | 7.42 µs | 10,746 | 71 | 14,599 |
| `str_arc` | 7.54 µs | 10,746 | 71 | 14,599 |
| `slice_box_u64` | 14.36 µs | 11,520 | 74 | 15,742 |
| `slice_rc_u64` | 14.55 µs | 11,520 | 73 | 15,742 |
| `slice_arc_u64` | 14.42 µs | 11,520 | 73 | 15,742 |
| `slice_box_droppy` | 125.90 µs | 1,480,329 | 1,120 | 2,162,947 |
| `slice_rc_droppy` | 131.59 µs | 1,505,378 | 1,122 | 2,201,008 |
| `slice_arc_droppy` | 129.98 µs | 1,505,378 | 1,122 | 2,201,008 |
| `alloc` | 976 ns | 410 | 16 | 617 |

## Multitude vs Bumpalo Head-to-Head

Direct comparisons of multitude versus bumpalo on identical workloads (the multitude variant chosen is the closest semantic equivalent to bumpalo's plain bump-allocation).

| Workload | Multitude time | Bumpalo time | Δ time | Multitude instr | Bumpalo instr | Δ instr |
|---|---:|---:|---:|---:|---:|---:|
| `alloc_u64/alloc` vs `bumpalo_alloc` | 4.02 µs | 4.02 µs | +0.1% | 16,032 | 21,025 | -23.7% |
| `alloc_str/alloc_str` vs `bumpalo_alloc_str` | 7.20 µs | 6.98 µs | +3.1% | 52,056 | 54,048 | -3.7% |
| `alloc_slice/alloc_slice_copy` vs `bumpalo_alloc_slice_copy` | 41.44 µs | 41.49 µs | -0.1% | 20,057 | 42,042 | -52.3% |
| `alloc_slice/alloc_slice_clone` vs `bumpalo_alloc_slice_clone` | 42.13 µs | 41.31 µs | +2.0% | 54,054 | 63,046 | -14.3% |
| `alloc_slice/alloc_slice_fill_with` vs `bumpalo_alloc_slice_fill_with` | 44.35 µs | 43.33 µs | +2.4% | 36,026 | 42,023 | -14.3% |
| `alloc_slice/alloc_slice_fill_iter` vs `bumpalo_alloc_slice_fill_iter` | 44.34 µs | 43.38 µs | +2.2% | 38,029 | 42,023 | -9.5% |
| `string_builder/alloc_string` vs `bumpalo_string_new_in` | 9.58 µs | 9.28 µs | +3.3% | 42,499 | 35,843 | +18.6% |
| `string_builder/alloc_string_with_capacity` vs `bumpalo_string_with_capacity_in` | 9.68 µs | 8.90 µs | +8.8% | 41,915 | 34,708 | +20.8% |
| `vec_builder/alloc_vec` vs `bumpalo_vec_new_in` | 1.23 µs | 3.98 µs | -69.2% | 12,209 | 12,281 | -0.6% |
| `vec_builder/alloc_vec_with_capacity` vs `bumpalo_vec_with_capacity_in` | 1.22 µs | 1.03 µs | +17.6% | 12,577 | 11,069 | +13.6% |