Peakbagger
TODO
- Benchmark performance
- Profile benchmark
- Optimize performance (7.91s → 3.35s, 58% faster)
- Handle bad SRTM data
Optimizations
Replace HashMap with FxHashMap in TileCache
Profile showed 27.5% of CPU in HashMap hashing (hash_one 20.4% +
Hasher::write 7.1%). TileCoord is 4 bytes (two i16s) but HashMap uses
SipHash. Switched to FxHashMap from rustc-hash which uses a fast
multiply-xor hash.
- Baseline: 7.91s
- After: 5.16s
- Improvement: 35%
Cache current tile pointer in find_next_peak
The ring search visits many consecutive cells in the same tile, but each
cache.elevation() call did a hash lookup. Added a CachedTile that
stores a raw pointer to the current tile's elevation data and skips the
hash lookup when the tile coordinate hasn't changed. The pointer is
stable because Vec heap data doesn't move when the HashMap resizes.
- Baseline: 5.16s
- After: 4.61s
- Improvement: 11%
Bulk memcpy + vectorized byte-swap in Tile::from_bytes
The per-element loop doing i16::from_be_bytes was not vectorized by
LLVM. Replaced with copy_nonoverlapping followed by an in-place
i16::from_be loop, which LLVM vectorizes into SIMD byte-swap
instructions.
- Baseline: 4.61s
- After: 3.61s
- Improvement: 22%
Reorder ring_cells for spatial coherence
The iteration interleaved top and bottom ring edges, alternating between
opposite ends on every cell. This defeated the CachedTile optimization,
forcing a hash lookup on nearly every cell. Reordering to iterate each
edge contiguously keeps consecutive cells in the same tile.
- Baseline: 3.61s
- After: 3.30s
- Improvement: 9%
Skip tiles below min_elevation threshold
Added max_elevation field to Tile. When the search enters a new
tile, if max_elevation ≤ min_elevation, all cells in that tile are
skipped without reading elevation data, avoiding cache misses entirely.
- Benchmark (35km searches): no significant change (tiles nearby have qualifying peaks)
- Bag 55 (207km search, min 2550m): ~5s → 2.7s (46% faster) — most tiles in the search area are below 2550m and get skipped
Fast path in resolve_global for same-tile cells
div_euclid / rem_euclid compile to idiv instructions (~30
cycles each). Added a fast path: if global row/col are in [0, stride),
the cell is in the center tile and no division is needed. Uses unsigned
comparison trick to check both bounds in one instruction.
- Baseline: 3.35s
- After: 3.27s
- Improvement: 2%
Reuse ring_cells Vec across rings (reverted)
Attempted to reuse a single Vec across ring iterations instead of
allocating a new one per ring. This caused an 8% regression, likely
because Vec::with_capacity with exact size produces better-fitting
allocations than reusing a growing buffer.