peakbagger 0.0.0

Bag sequential local elevation maxima
peakbagger-0.0.0 is not a library.

Peakbagger

TODO

  • Benchmark performance
  • Profile benchmark
  • Optimize performance (7.91s → 3.35s, 58% faster)
  • Handle bad SRTM data

Optimizations

Replace HashMap with FxHashMap in TileCache

Profile showed 27.5% of CPU in HashMap hashing (hash_one 20.4% + Hasher::write 7.1%). TileCoord is 4 bytes (two i16s) but HashMap uses SipHash. Switched to FxHashMap from rustc-hash which uses a fast multiply-xor hash.

  • Baseline: 7.91s
  • After: 5.16s
  • Improvement: 35%

Cache current tile pointer in find_next_peak

The ring search visits many consecutive cells in the same tile, but each cache.elevation() call did a hash lookup. Added a CachedTile that stores a raw pointer to the current tile's elevation data and skips the hash lookup when the tile coordinate hasn't changed. The pointer is stable because Vec heap data doesn't move when the HashMap resizes.

  • Baseline: 5.16s
  • After: 4.61s
  • Improvement: 11%

Bulk memcpy + vectorized byte-swap in Tile::from_bytes

The per-element loop doing i16::from_be_bytes was not vectorized by LLVM. Replaced with copy_nonoverlapping followed by an in-place i16::from_be loop, which LLVM vectorizes into SIMD byte-swap instructions.

  • Baseline: 4.61s
  • After: 3.61s
  • Improvement: 22%

Reorder ring_cells for spatial coherence

The iteration interleaved top and bottom ring edges, alternating between opposite ends on every cell. This defeated the CachedTile optimization, forcing a hash lookup on nearly every cell. Reordering to iterate each edge contiguously keeps consecutive cells in the same tile.

  • Baseline: 3.61s
  • After: 3.30s
  • Improvement: 9%

Skip tiles below min_elevation threshold

Added max_elevation field to Tile. When the search enters a new tile, if max_elevation ≤ min_elevation, all cells in that tile are skipped without reading elevation data, avoiding cache misses entirely.

  • Benchmark (35km searches): no significant change (tiles nearby have qualifying peaks)
  • Bag 55 (207km search, min 2550m): ~5s → 2.7s (46% faster) — most tiles in the search area are below 2550m and get skipped

Fast path in resolve_global for same-tile cells

div_euclid / rem_euclid compile to idiv instructions (~30 cycles each). Added a fast path: if global row/col are in [0, stride), the cell is in the center tile and no division is needed. Uses unsigned comparison trick to check both bounds in one instruction.

  • Baseline: 3.35s
  • After: 3.27s
  • Improvement: 2%

Reuse ring_cells Vec across rings (reverted)

Attempted to reuse a single Vec across ring iterations instead of allocating a new one per ring. This caused an 8% regression, likely because Vec::with_capacity with exact size produces better-fitting allocations than reusing a growing buffer.