delaunay 0.7.2

A d-dimensional Delaunay triangulation library with float coordinate support
Documentation
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
# Phase 4 Benchmark Consolidation Plan

## Executive Summary

**Problem:** Currently using `triangulation_creation.rs` for Phase 4 benchmarking, but it's
the wrong tool - it only measures basic construction and is redundant with CI benchmarks.

**Solution:** Migrate to `large_scale_performance.rs`, which was **specifically designed for
Phase 4 SlotMap evaluation** and measures:

- ✅ Iteration performance (vertex/cell/neighbor traversals)
- ✅ Memory usage patterns (RSS tracking)
- ✅ Query performance (lookups, contains checks)
- ✅ Validation stress-testing (topology checks)
- ✅ 1K-10K scale appropriate for SlotMap comparisons

**Phase 4 Goal:** Enable swapping SlotMap implementations via Cargo feature flags
(SlotMap ↔ DenseSlotMap ↔ HopSlotMap) for benchmarking, targeting 10-15% iteration
performance improvement. The DenseSlotMap backend is gated by the `dense-slotmap` feature.

**Current state (2025-12-13):**

- Cargo feature `dense-slotmap` (DenseSlotMap backend) is enabled by default (`default = ["dense-slotmap"]`)
- SlotMap remains supported via `--no-default-features`
- Local comparison tooling: `uv run compare-storage-backends` (or `just compare-storage`)
- Cluster comparison tooling: `scripts/slurm_storage_comparison.sh` (saves Criterion baselines for `critcmp`)

## Current Benchmark Suite Issues

### Overlaps & Redundancies

1. **Triangulation Creation Overlap:**
   - `ci_performance_suite.rs`, `triangulation_creation.rs`, `microbenchmarks.rs`, and `profiling_suite.rs` all benchmark basic triangulation creation
   - **Problem:** `triangulation_creation.rs` is redundant with `ci_performance_suite.rs` (already used by CI/scripts)

2. **Assign Neighbors Duplication:**
   - Both `assign_neighbors_performance.rs` and `microbenchmarks.rs` test the same operation
   - `assign_neighbors_performance.rs` is more comprehensive (grid, spherical, scaling tests)

3. **Memory Measurement Overlap:**
   - `large_scale_performance.rs` and `profiling_suite.rs` both measure memory
   - Redundant memory bench files were consolidated into `profiling_suite.rs`

### Scripts Integration

- `scripts/benchmark_utils.py` is hardcoded to use **`ci_performance_suite.rs`** for baseline generation and regression testing
- Phase 4 backend comparison tooling exists via:
  - `scripts/compare_storage_backends.py` (`uv run compare-storage-backends`)
  - `scripts/slurm_storage_comparison.sh` (cluster runs; saves Criterion baselines for `critcmp`)

## Implementation Plan

### ✅ 1. Kickoff and Scope Alignment

**Status:** ✅ Completed (2025-10-20)

**Tasks:**

- [x] Confirm Phase 4's primary goal: evaluate SlotMap-backed TDS performance and memory behavior at large scale
- [x] Make `large_scale_performance.rs` the primary Phase 4 benchmark (replacing `triangulation_creation.rs`)
- [x] Adopt one-cycle deprecation policy for redundant benches
- [x] Keep work isolated in feature branches when convenient (process note)

**Notes:**

- Backend selection remains compile-time via Cargo features (no runtime abstraction)
- `dense-slotmap` (DenseSlotMap backend) is now enabled by default (2025-12-13)

---

### ✅ 2. Inventory All Benchmark Files

**Status:** ✅ Completed (2025-10-20)

**Tasks:**

- [x] List current benches: `ls benches/*.rs`
- [x] Record per file: purpose, operations covered, dataset sizes, CLI parameters/env vars, Criterion groups/IDs, output format
- [x] Validate expected bench set (current):
  - `ci_performance_suite.rs`
  - `large_scale_performance.rs` (Phase 4 primary)
  - `profiling_suite.rs`
  - `microbenchmarks.rs`
  - `assign_neighbors_performance.rs`
  - `circumsphere_containment.rs`
  - `triangulation_creation.rs` (deprecated harness)

**Deliverable:** `benches/INVENTORY.md` (temporary, to be folded into `benches/README.md`)

---

### ✅ 3. Map Benchmark Usage in GitHub Actions and Scripts

**Status:** ✅ Completed (2025-10-20)

**Tasks:**

- [x] Scan workflows: `rg -n "bench" .github/workflows/benchmarks.yml .github/workflows/profiling-benchmarks.yml .github/workflows/generate-baseline.yml`
- [x] Scan Python scripts: `rg -n "(cargo bench|criterion|bench.*filter|--bench)" scripts/`
- [x] Scan justfile: `rg -n "(cargo bench|criterion|bench.*filter|--bench)" justfile`
- [x] Review `scripts/benchmark_utils.py` for hardcoded bench names, filters, baselines

**Deliverable:** `benches/USAGE_MAP.md` (temporary) - matrix of: benchmark file ↔ workflow job(s) ↔ script command(s) ↔ just targets

**Known Usage:**

- `ci_performance_suite.rs`: Used by `benchmark_utils.py` for baseline generation (line 1338, 1345, 1441, 1448)
- `circumsphere_containment.rs`: Used by `PerformanceSummaryGenerator` (line 296)
- `profiling_suite.rs`: Used by `.github/workflows/profiling-benchmarks.yml`

---

### ✅ 4. Document Purpose and Overlap of All Benches

**Status:** ✅ Completed (2025-10-20)

**Current Known Purposes:**

| Benchmark | Purpose | Scale | Operations | Phase 4 Relevant? |
|-----------|---------|-------|------------|-------------------|
| `ci_performance_suite.rs` | CI regression detection | 10-50 pts, 2D-5D | Basic construction | No |
| `triangulation_creation.rs` | Deprecated harness | N/A | Prints deprecation notice | No |
| `large_scale_performance.rs` | **Phase 4 backend eval** | 1K-10K vertices | Construction, memory, iteration, queries, validation | **YES - PRIMARY** |
| `profiling_suite.rs` | Comprehensive profiling | 10³-10⁶ pts | Scaling, memory profiling, query latency, etc. | Partial - too heavy |
| `microbenchmarks.rs` | Core operations | Various | Bowyer-Watson + validation microbenches | Keep |
| `assign_neighbors_performance.rs` | Neighbor assignment | 10-50 pts, 2D-5D | Distributions + scaling | Keep |
| `circumsphere_containment.rs` | Algorithm comparison | Random queries | Circumsphere predicates | Keep |

**Deliverable:** Consolidated section in `benches/README.md` with detailed table

---

### ✅ 5. Design the Consolidation Plan

**Status:** ✅ Completed (2025-10-20)

**Decisions (implemented):**

- [x] Deprecate `triangulation_creation.rs` (keep as a one-cycle deprecation harness)
- [x] Consolidate memory benchmarks into `profiling_suite.rs`
- [x] Deduplicate `microbenchmarks.rs` around `assign_neighbors`
- [x] Codify `large_scale_performance.rs` as the Phase 4 primary benchmark
- [ ] Standardize CLI/env controls across benches (optional; partially addressed via env vars)

**Deliverable:**

- This document (`docs/archive/phase4.md`) serves as the consolidation plan and progress log.

---

### ✅ 6. Deprecate triangulation_creation.rs

**Status:** ✅ Completed (2025-10-20)

**Implementation: Option A (One-Cycle Deprecation)** ✅

- [x] Replace contents with minimal harness that:
  - Prints clear deprecation message pointing to `large_scale_performance.rs` (Phase 4) and `ci_performance_suite.rs` (CI)
  - Exits early to avoid wasting CI time
- [x] Keep file for one release cycle

**Option B (Immediate Removal):**

- [ ] Delete file entirely
- [ ] Update all references in workflows, scripts, and justfile

**Tasks:**

- [x] Update `scripts/benchmark_utils.py` to stop referencing `triangulation_creation`
- [x] Update CI workflows if any reference it
- [x] Update `justfile` if any targets reference it

---

### ✅ 7. Consolidate Memory Benchmarks into profiling_suite.rs

**Status:** ✅ Completed (2025-10-20)

**Tasks:**

- [x] Consolidate memory profiling into `profiling_suite.rs` (memory_profiling group)
- [x] Remove redundant memory benchmark files (`memory_scaling.rs`, `triangulation_vs_hull_memory.rs`)
- [x] Update docs and Cargo configuration to reflect the consolidation
- [ ] Optional Add richer metadata export (dataset/dim/seed/units) for automated analysis

**Memory Measurement Strategy:**

```rust
// Current approach in large_scale_performance.rs
use sysinfo::{ProcessRefreshKind, ProcessesToUpdate, RefreshKind, System};

fn get_memory_usage() -> u64 {
    // Returns RSS in KiB
}

// Keep this for Phase 4 SlotMap evaluation
// Optional: Add allocation-counter for detailed tracking
```

---

### ✅ 8. Deduplicate microbenchmarks Around assign_neighbors

**Status:** ✅ Completed (2025-10-20)

**Tasks:**

- [x] Remove `assign_neighbors` duplicates from `microbenchmarks.rs`
- [x] Centralize neighbor-assignment benchmarking in `assign_neighbors_performance.rs`
- [x] Preserve baseline history by keeping benchmark names stable where possible
- [x] Add module docs pointing contributors to the consolidated benchmark

**What Remains in microbenchmarks.rs:**

- Bowyer-Watson triangulation benchmarks (unique)
- `remove_duplicate_cells` benchmarks (unique)
- Validation method benchmarks (unique)
- Incremental construction benchmarks (unique)

---

### ✅ 9. Elevate large_scale_performance.rs for Phase 4

**Status:** ✅ Completed (2025-10-20)

**Current State:**

- ✅ Already designed for Phase 4 SlotMap evaluation (see file comments)
- ✅ Tests iteration: `bench_vertex_iteration`, `bench_cell_iteration`
- ✅ Tests memory: `bench_memory_usage` with RSS tracking
- ✅ Tests queries: `bench_neighbor_queries`
- ✅ Tests validation: `bench_validation` (topology stress-test)
- ✅ Supports 1K-10K scale

**Enhancements Needed:**

- [ ] **Add standardized CLI/env control for datasets:**
  - `BENCH_N`: point count
  - `BENCH_DIM`: dimension (2-5)
  - `BENCH_SEED`: RNG seed
  - `BENCH_DISTRIBUTION`: grid, random, clustered
  - `BENCH_OP_MIX`: operation ratio
  - `BENCH_JSON_OUT`: structured results path

  Already supported:
  - `BENCH_LARGE_SCALE` (toggles larger 4D point counts)
  - `BENCH_SAMPLE_SIZE`, `BENCH_WARMUP_SECS`, `BENCH_MEASUREMENT_TIME` (Criterion tuning)

- [x] **Criterion IDs are stable and baseline-friendly**

  Current scheme: `<category>/<dim>/<n>v` (e.g. `construction/3D/1000v`).

- [ ] **Add cache locality measurement (optional):**

  ```rust
  // Behind feature flag: perf_guard or similar
  // Use perf/callgrind hooks for cache miss analysis
  ```

- [x] **Support SlotMap vs alternative comparison:**

  ```rust
  // Use feature flags + type aliases for zero-cost abstraction:
  #[cfg(feature = "dense-slotmap")]
  type StorageBackend<K, V> = DenseSlotMap<K, V>;
  
  #[cfg(not(feature = "dense-slotmap"))]
  type StorageBackend<K, V> = SlotMap<K, V>;
  
  // Benchmark with:
  // cargo bench --bench large_scale_performance  # default (feature: dense-slotmap)
  // cargo bench --no-default-features --bench large_scale_performance  # SlotMap
  ```

**Key Metrics for Phase 4:**

1. **Iteration speed**: Full vertex/cell traversals, neighbor walks
2. **Memory usage**: Peak RSS, per-element footprint estimates  
3. **Cache locality**: Traversal patterns (BFS vs random access)
4. **Query performance**: Lookups, contains checks, incident-entity queries

---

### ✅ 10. Phase 4 Storage Backend Comparison Tooling

**Status:** ✅ Completed (2025-12-13)

Instead of adding Phase 4 baseline JSON subcommands to `benchmark_utils.py`, Phase 4 backend
comparison is handled via Criterion baselines and dedicated scripts.

**Tooling:**

- `scripts/compare_storage_backends.py` (`uv run compare-storage-backends`)
  - Runs `cargo bench` twice:
    - DenseSlotMap (feature: `dense-slotmap`; default)
    - SlotMap (`--no-default-features`)
  - Generates a markdown report (default: `artifacts/storage_comparison.md`)

- `scripts/slurm_storage_comparison.sh`
  - Runs both backends on a Slurm cluster
  - Saves Criterion baselines (`slotmap`, `denseslotmap`) and supports `critcmp`

**Commands:**

```bash
# Local comparison report
just compare-storage

# Large scale comparison (sets BENCH_LARGE_SCALE=1)
just compare-storage-large

# Direct invocation
uv run compare-storage-backends --bench large_scale_performance
```

**Tasks:**

- [x] Local backend comparison + markdown report (`compare_storage_backends.py`)
- [x] Cluster backend comparison script saving baselines (`slurm_storage_comparison.sh`)
- [x] Documentation updated for `dense-slotmap` default (DenseSlotMap) + SlotMap via `--no-default-features`
- [ ] Optional Add Phase 4 baseline JSON generation to `benchmark_utils.py` for CI-style regression testing

---

### ✅ 11. Update GitHub Actions Workflows

**Status:** ✅ Completed (2025-10-20)

**benchmarks.yml (Performance Regression Testing):**

- [x] Replace any `triangulation_creation` references with `ci_performance_suite.rs` or `large_scale_performance.rs`
- [ ] Add optional Phase 4 job that runs reduced-size `large_scale_performance.rs` smoke test
- [x] Keep runtime reasonable for CI runs (use dev settings/timeouts where appropriate)

**profiling-benchmarks.yml (Comprehensive Profiling):**

- [x] Point memory jobs to `profiling_suite.rs` memory groups (after consolidation)
- [x] Gate heavy runs by workflow_dispatch labels or schedules
- [ ] Add Phase 4-specific profiling job (optional, manual trigger)

**generate-baseline.yml (Baseline Generation):**

- [ ] Add job to generate and upload `phase4_baseline.json` via new script command
- [ ] Store artifact with 90-day retention (same as release baselines)
- [ ] Trigger on: manual, monthly schedule, release tags

**Example Phase 4 Job:**

```yaml
phase4-smoke-test:
  name: Phase 4 SlotMap Smoke Test
  runs-on: macos-15
  timeout-minutes: 30
  
  steps:
    - uses: actions/checkout@v5
    
    - name: Install Rust toolchain
      uses: actions-rust-lang/setup-rust-toolchain@v1
    
    - name: Run Phase 4 benchmarks (reduced scale)
      run: |
        cargo bench --bench large_scale_performance -- \
          --sample-size 10 \
          "construction/3D/1000" \
          "queries/neighbors/3D/1000" \
          "iteration/vertices/3D/1000"
```

---

### ✅ 12. Documentation and Changelog Updates

**Status:** ✅ Completed (2025-10-20)

**benches/README.md:**

- [x] Add categorization section:
  - **CI Benchmarks**: `ci_performance_suite.rs` (fast, regression detection)
  - **Profiling Benchmarks**: `profiling_suite.rs` (comprehensive, 1-2 hours)
  - **Phase 4 Benchmarks**: `large_scale_performance.rs` (SlotMap evaluation)
  - **Algorithm Comparison**: `circumsphere_containment.rs`
  - **Specialized**: `assign_neighbors_performance.rs`
  - **Deprecated**: `triangulation_creation.rs` (use `ci_performance_suite.rs` or `large_scale_performance.rs`)

- [x] Add "When to use which" guidance:

  ```markdown
  ## Benchmark Selection Guide
  
  | Use Case | Benchmark | Command |
  |----------|-----------|---------|
  | Quick CI regression check | `ci_performance_suite.rs` | `just bench` or `cargo bench --bench ci_performance_suite` |
  | Phase 4 SlotMap evaluation | `large_scale_performance.rs` | `cargo bench --bench large_scale_performance` |
  | Deep profiling (1-2 hours) | `profiling_suite.rs` | `cargo bench --bench profiling_suite` |
  | Memory analysis | `profiling_suite.rs` (memory groups) | `cargo bench --bench profiling_suite -- memory` |
  | Algorithm comparison | `circumsphere_containment.rs` | `cargo bench --bench circumsphere_containment` |
  ```

- [x] Explicitly document `large_scale_performance.rs` as Phase 4 primary
- [x] Add deprecation notice for `triangulation_creation.rs`

**docs/code_organization.md:**

- [x] Update benchmark section to reflect new layout
- [x] Add Phase 4 benchmark responsibilities
- [x] Document memory benchmark consolidation

**CHANGELOG.md:**

- Note: `CHANGELOG.md` is auto-generated from git history in this repo.
- Do not edit it manually; ensure the relevant commits exist and run `just changelog` before release.

---

### ☐ 13. Add Missing Coverage (Time Permitting)

**Status:** Not Started

**Priority 1 (High Value):**

- [ ] **Convex hull timing benchmarks**
  - Add to `profiling_suite.rs` or separate file
  - Cover varied distributions (random, grid, clustered) and dimensions (2D-5D)
  - Currently only memory benchmarks exist in `profiling_suite.rs` (`memory_profiling` group)

**Priority 2 (Moderate Value):**

- [ ] **Serialization/deserialization benchmarks**
  - Add Criterion benches for Serde (bincode, JSON)
  - Vary triangulation sizes (1K, 10K, 100K vertices)
  - Measure throughput (MB/s) and time per operation
  
- [ ] **f32 vs f64 coordinate type comparison**
  - Matrix: dimensions (2D-5D) × sizes (1K, 10K) × distributions (random, grid)
  - Report relative speed and memory deltas
  - Use const generic benchmarks or feature flags

**Priority 3 (Nice to Have):**

- [ ] **Point location strategies** (if multiple implementations exist)
- [ ] **Incremental vs batch insertion** behavior analysis
- [ ] **Parallel construction benchmarks** (if parallelization exists/planned)

---

### ✅ 14. Quality Gates, Validation, and CI Safety

**Status:** ✅ Completed (2025-12-13)

**Pre-commit Checks (for all changes):**

```bash
# Format and lint
just fmt
just clippy
just doc-check

# Python quality
just python-lint

# Documentation quality
just markdown-lint
just spell-check

# Configuration validation
just validate-json
just validate-toml
```

**Benchmark-Specific Checks:**

```bash
# Verify benchmarks compile after Rust edits
just bench-compile

# Run Python script tests
uv run pytest

# Quick smoke test of benchmarks (reduced iterations)
cargo bench --bench ci_performance_suite -- --test
cargo bench --bench large_scale_performance -- --test
```

**PR Strategy:**

- [ ] **Split PRs**: docs/Python-only vs Rust changes
  - Docs-only PRs skip expensive CI benchmark runs
  - Rust PRs trigger full benchmark suite
- [ ] Use `[skip ci]` in commit messages for documentation-only changes
- [ ] Create feature branch for each major change
- [ ] Run quality gates before pushing

**Tasks:**

- [x] Run all quality gates on changed files (`just ci`)
- [x] Verify benchmark compilation with `just bench-compile`
- [x] Run Python tests with `uv run pytest`
- [x] Verify SlotMap builds/tests with `cargo test --no-default-features`
- [ ] Test smoke runs of modified benchmarks

---

### ◔ 15. Acceptance Criteria and Sign-Off

**Status:** ◔ In Progress (updated 2025-12-13)

**Benchmark Files:**

- [x] No broken references to removed/deprecated benches in workflows, scripts, or justfile
- [x] `large_scale_performance.rs` covers construction, memory, iteration, queries, and validation (2D–5D)
- [ ] Cache locality measurement (optional; not implemented)
- [x] Stable Criterion IDs (scheme: `<category>/<dim>/<n>v`)
- [ ] JSON output schema beyond Criterion (optional; not implemented)

**Backend Comparison Tooling:**

- [x] Local comparison report generator: `uv run compare-storage-backends`
- [x] Cluster comparison script: `scripts/slurm_storage_comparison.sh` (saves baselines for `critcmp`)
- [ ] Optional CI-style Phase 4 baseline JSON generation (deferred)

**Build System:**

- [x] Compare storage backends: `just compare-storage`, `just compare-storage-large`
- [x] Run large-scale benchmark directly: `cargo bench --bench large_scale_performance`

**Documentation:**

- [x] `benches/README.md` guides contributors and documents Phase 4 benchmarks
- [x] `docs/code_organization.md` reflects benchmark layout and memory consolidation
- [ ] Changelog entry is generated (do not edit `CHANGELOG.md` directly)

**Quality Gates:**

- [x] `just ci` passes locally
- [x] `cargo test --no-default-features` passes
- [x] `uv run pytest` passes

---

## Phase 4 SlotMap Evaluation Metrics

### Key Performance Indicators

Once the benchmark consolidation is complete, Phase 4 will evaluate these metrics:

1. **Iteration Performance** (10-15% improvement target with DenseSlotMap; feature: `dense-slotmap`)
   - Full vertex traversal time
   - Full cell traversal time
   - Neighbor-following traversal patterns
   - Filtered iteration (predicates)

2. **Memory Efficiency** (50% reduction target from Phase 3)
   - Peak RSS during construction
   - Per-vertex memory footprint
   - Per-cell memory footprint
   - Memory fragmentation analysis

3. **Cache Locality** (5-10% cache miss reduction target)
   - Sequential access patterns
   - Random access patterns
   - BFS traversal over adjacency graph
   - Optional: perf/cachegrind integration

4. **Query Performance** (maintain or improve)
   - Key lookup time
   - Contains-key checks
   - Neighbor queries
   - Incident-entity queries

### Collection Backend Comparison

| Backend | Iteration | Memory | Insertion | Removal | Best For |
|---------|-----------|--------|-----------|---------|----------|
| DenseSlotMap (`dense-slotmap`, default) | **Excellent** | Dense/contiguous | O(1) amortized | O(1) with moves | Stable/iteration |
| SlotMap (optional) | Good | Sparse | O(1) amortized | O(1) | Dynamic changes |
| HopSlotMap (future) | Good | Hop-optimized | O(1) | O(1) | Large scale |

### Success Criteria

- [ ] `dense-slotmap` (DenseSlotMap) implementation shows 10-15% iteration improvement
- [ ] No regression in other operations (insertion, removal, lookup)
- [ ] Memory usage comparable or better than current SlotMap
- [ ] 100% API compatibility maintained via trait abstraction
- [ ] Easy benchmarking via type parameter swap

---

## References

- **Phase 4 Roadmap:** `docs/archive/OPTIMIZATION_ROADMAP.md` (see Phase 4 section)
- **Large Scale Benchmark:** `benches/large_scale_performance.rs` (Phase 4 evaluation comments on lines 16-23, 57-58)
- **Current Benchmark Suite:** `benches/README.md`
- **Benchmark Tooling:** `scripts/benchmark_utils.py`
- **CI Workflows:** `.github/workflows/benchmarks.yml`, `.github/workflows/profiling-benchmarks.yml`

---

## Progress Tracking

**Last Updated:** 2025-12-13

**Overall Status:** ✅ Benchmark consolidation complete; ✅ `dense-slotmap` (DenseSlotMap) is default

**Completed Steps:**

- ✅ Step 1: Kickoff and scope alignment
- ✅ Step 2: Inventory all benchmark files
- ✅ Step 3: Map benchmark usage in workflows/scripts
- ✅ Step 4: Document purpose and overlap
- ✅ Step 5: Consolidation plan (this document)
- ✅ Step 6: Deprecate `triangulation_creation.rs`
- ✅ Step 7: Consolidate memory benchmarks
- ✅ Step 8: Deduplicate `assign_neighbors` benchmarks
- ✅ Step 9: Elevate `large_scale_performance.rs` for Phase 4
- ✅ Step 10: Backend comparison tooling (local + Slurm)
- ✅ Step 12: Documentation updates
- ✅ Step 14: Quality gates and validation

**Next Steps (optional):**

1. ☐ Add Phase 4 smoke test job in CI for `large_scale_performance.rs` (reduced scale)
2. ☐ Add dataset CLI/env controls (`BENCH_N`, `BENCH_DIM`, `BENCH_SEED`, distributions)
3. ☐ Add cache locality measurement (optional)
4. ✅ Archived under `docs/archive/phase4.md`

---

## Notes

### Session 2025-10-20

**Completed:**

- ✅ Steps 2-4: Full inventory, usage mapping, and documentation
- ✅ Updated `benches/README.md` with:
  - Comprehensive "Benchmark Suite Overview" table
  - "Benchmark Selection Guide" with use cases
  - Phase 4 section explicitly documenting `large_scale_performance.rs` as primary
  - Deprecation notice for `triangulation_creation.rs`
- ✅ Fixed markdown linting issues (line length)
- ✅ Added profiling tool terms to spell check: `dhat`, `callgrind`, `cachegrind`

**Key Findings:**

- **Confirmed:** `triangulation_creation.rs` has ZERO usage (no workflows, scripts, justfile)
- **Confirmed:** `large_scale_performance.rs` already designed for Phase 4 (iteration, memory, queries)
- **Documented overlaps:**
  1. `triangulation_creation.rs` - 100% redundant with `ci_performance_suite.rs`
  2. `microbenchmarks.rs` - Has duplicate `assign_neighbors` tests
  3. Memory benchmarks overlap: `memory_scaling.rs`, `triangulation_vs_hull_memory.rs`, `profiling_suite.rs`

**Decisions Made:**

- Use `large_scale_performance.rs` as Phase 4 primary (not `triangulation_creation.rs`)
- Deprecate `triangulation_creation.rs` using Option A (one-cycle deprecation with notice)
- Consolidate memory benchmarks into `profiling_suite.rs`

**Implementation Details:**

- **Step 6 Deprecation:**
  - Replaced `triangulation_creation.rs` with minimal deprecation harness
  - Prints clear deprecation notice and migration guidance
  - Directs users to `ci_performance_suite.rs` (CI) and `large_scale_performance.rs` (Phase 4)
  - File compiles, passes clippy and fmt checks
  - Ready for removal in next major release

- **Step 7 Consolidation:**
  - Deleted `memory_scaling.rs` and `triangulation_vs_hull_memory.rs` (zero external usage)
  - Removed benchmark entries from `Cargo.toml`
  - Updated `benches/README.md` to remove deleted benchmarks from table
  - Memory profiling consolidated in `profiling_suite.rs` (already comprehensive)
  - Phase 4 memory evaluation uses `large_scale_performance.rs`
  - Benchmarks compile successfully, all quality checks pass

- **Step 8 Deduplication:**
  - Removed duplicate `assign_neighbors` benchmarks from `microbenchmarks.rs` (2D-5D)
  - Removed functions: `benchmark_assign_neighbors_2d/3d/4d/5d` and legacy wrapper
  - Removed from criterion_group targets (4 dimensional + 1 legacy = 5 functions)
  - Updated module doc to direct users to `assign_neighbors_performance.rs`
  - Comprehensive `assign_neighbors` testing now centralized with distributions (random, grid, spherical) and scaling
  - File compiles, passes fmt, clippy, and spell check

- **Step 9 Phase 4 Elevation:**
  - Added 5D benchmark suite with small point counts [500, 1K] ~30-60 min
  - Added configurable scaling for 4D via `BENCH_LARGE_SCALE` env var
  - Default runtime: ~2-3 hours (2D/3D/4D/5D, suitable for local development)
  - Large scale: ~4-6 hours (4D@10K, requires compute cluster)
  - Point count strategy: 1K-10K (2D/3D), 1K-3K (4D default), 500-1K (5D)
  - Complete dimensional coverage: 2D, 3D, 4D, 5D
  - All Phase 4 metrics covered: construction, memory, iteration, queries, validation

- **Step 10 Justfile Targets (historical):**
  - The `bench-phase4*` targets were later removed (2025-12-13).
  - Use `cargo bench --bench large_scale_performance` and `just compare-storage*` instead.

- **Step 11 GitHub Actions Workflows:**
  - Updated profiling-benchmarks.yml: memory_scaling → profiling_suite (memory_profiling)
  - Set development mode for tag pushes to keep runtime reasonable (~1-2 hours vs 4-6 hours)
  - Full production profiling only for manual dispatch or scheduled monthly runs
  - Verified benchmarks.yml and generate-baseline.yml use benchmark-utils (no changes needed)
  - All workflows now reference correct benchmark files after consolidation

- **Step 12 Documentation Updates:**
  - Updated docs/code_organization.md: Removed deleted benchmarks, added large_scale_performance.rs, marked triangulation_creation.rs as deprecated
  - Added phase4.md to documentation tree
  - benches/README.md already updated in earlier steps
  - CHANGELOG.md update deferred to next release

**Blockers:**

- None

**Next Session:**

- Continue with step 10: Add Phase 4 tooling to scripts/justfile
- Then step 11: Update GitHub Actions workflows
- Then step 12: Final documentation and changelog

---

### General Guidelines

- Keep this document updated as work progresses
- Check off items in the TODO list as they're completed
- Add any blockers or issues encountered in session notes
- Reference this document when picking up work after breaks