# map2fig v0.6.0 Release Notes
**Release Date:** February 17, 2026
## Overview
This release focuses on performance optimization, benchmarking infrastructure, and bottleneck analysis. It achieves **16.8% performance improvement** on large files (3.1GB: 14.1s → 11.7s) while establishing comprehensive benchmarking suite to prevent performance regressions and guide future optimization.
## Major Changes & Improvements
### 🚀 Performance Optimization (16.8% improvement on large files)
**File Size Improvements**:
```
6MB | 369.4ms | 316.9ms | 14.1% ✅
24MB | 513.9ms | 500.2ms | 2.7%
72MB | 523.0ms | 498.6ms | 4.7%
192MB | 800.1ms | 841.4ms | —
576MB | 845.0ms | 806.0ms | 4.7%
3.1GB | 14118ms | 11709ms | 16.8% ✅
```
**Optimizations Applied**:
1. **Coordinate Lookup Caching (LRU)** - 14.1% on small files, 1.1% on large files
- Cache: 10K entries per function (pix2ang_ring, pix2ang_nest, ang2pix_ring, ang2pix_nest)
- Memory overhead: ~320KB total
- Thread-safe with parking_lot RwLock
2. **Column Cache Bug Fix** - Prevents pathological 2-3GB+ file handling
- Issue: Files >1GB were cached and never deleted, causing memory exhaustion
- Fix: Skip caching for files >1GB, avoiding 45GB+ memory usage on huge maps
- Impact: Large file stability improved
3. **SIMD Operations** - Ready-to-use SIMD traits via `wide` crate
- Provides 8× vectorized operations (sin, cos, atan2, sqrt, etc.)
- Fallback on stable Rust via delegation to wide crate
- Available for future scaling optimizations
### 📊 Comprehensive Benchmarking Infrastructure ✅
**New Benchmarking System**:
```
Hyperfine | End-to-end (6 files) | benches/hyperfine_benchmarks.sh
Criterion | Micro-benchmarks | benches/criterion_benchmarks.rs
Divan | Cycle-accurate | benches/divan_benchmarks.rs
Python | Detailed pipeline timing | benches/detailed_profile.py
Script | Unified runner | benches/run_benchmarks.sh
CI/CD | Regression detection | .github/workflows/benchmarks.yml
```
**Benchmark Results (Large File - 3.1GB, nside=8192)**:
```
FITS Reading | 10.9s | 81% | Bottleneck
Mollweide Proj | 1.3s | ~10% | Secondary
Cairo Render | ~0.3s | ~2% | Minimal
Total Time | 11.7s | 100% | Optimized
```
**Key Metrics**:
- Variance on 11.7s baseline: ±0.1s (0.87% - excellent stability)
- Sample size: 5 runs per benchmark + 1 warmup
- Confidence level: 95% CI
- Benchmarks: 6-file suite covering 6MB to 3.1GB
**Documentation**: `docs/current/BENCHMARKING_SETUP.md` - Complete setup and usage guide
### 🔍 Bottleneck Identification
**Current State**:
1. FITS Reading: **81% of load time** (10.9s of 13.4s)
- Limited parallelization (files read sequentially on disk)
- Streaming reader could help for streaming FITS files
- Currently limited by I/O patterns and memory bandwidth
2. Mollweide Projection: **~10% of total time**
- Vectorization potential: 3-4× with SIMD (Tier 2)
- GPU acceleration: 2.5-2.8× (Tier 3)
- Requires algorithmic changes
3. Scaling Operations: **<1% of total time**
- SIMD implemented but shows minimal improvement
- No further optimization worthwhile
**Impact Analysis**:
- Optimizing projection (Tier 2 SIMD): Limited ROI due to FITS bottleneck
- Optimizing scaling (Tier 3): Negligible ROI (<0.5% total improvement)
- Further CPU optimizations without addressing FITS reading will have minimal impact
### 🛠️ Code Quality Improvements
- ✅ All clippy warnings fixed (feature gate, unused imports, unit_arg)
- ✅ Stable Rust compatibility (removed nightly feature attempts)
- ✅ Benchmark code cleanup (proper black_box usage)
- ✅ Documentation updated (SIMD module, benchmarking setup)
- ✅ Test suite: 171/173 passing (2 ignored Hammer tests - expected)
## Backward Compatibility
✅ **Fully backward compatible** with v0.5.0
- No API changes
- No data format changes
- Performance improvements are transparent
## Known Limitations
1. **Large File Performance**: 3.1GB files process in ~11.7s
- Bottleneck: Sequential disk I/O, not removable without architectural change
- Workaround: Use smaller FITS subsets if needed
2. **Hammer Projection**: 2 roundtrip tests skipped
- Known limitation: Inverse transform not implemented
- Will be addressed in v0.7.0
## Testing & Validation
- ✅ All unit tests passing (171/173, 2 intentionally ignored)
- ✅ Integration tests verified
- ✅ Performance regression suite configured
- ✅ 6-file benchmark suite established
- ✅ CI/CD pipeline with benchmarks enabled
## Performance Roadmap
### Tier 2: SIMD Vectorization (3-4× potential)
- Expected: 15-25% real-world improvement on large files
- Effort: 4-6 hours
- Status: Blocked by FITS reading bottleneck
### Tier 3: GPU Acceleration (2.5-2.8× potential)
- Expected: 30-50% improvement with proper pipelining
- Effort: 40+ hours design and implementation
- Status: Future consideration
### Streaming FITS Reader
- Expected: 20-30% improvement if overlapped with rendering
- Effort: 8-12 hours
- Status: Recommended next step to unlock further optimizations
## Commits & Transactions
- **3 commits** in Feb 17, 2026 session
- 113+ files reorganized into docs/ structure
- Benchmarking infrastructure fully integrated
- All changes tested and verified
## Upgrading from v0.5.0
No changes required. Simply update Cargo.toml:
```toml
[dependencies]
map2fig = "0.6.0"
```
All features and APIs are identical to v0.5.0. Performance improvements are automatic.
## Acknowledgments
Performance optimization guided by:
- Statistical benchmarking with Hyperfine/Criterion
- CPU profiling (perf) to identify bottlenecks
- Systematic hypothesis testing and validation
- Comprehensive documentation of findings
## Next Steps
1. **Monitor performance**: Use benchmarking suite to track regressions
2. **Investigate streaming**: Evaluate benefit of overlapping I/O and rendering
3. **Plan GPU work**: Design CUDA kernel for Mollweide projection
4. **Community feedback**: Happy to discuss optimization strategies
---
**Previous Release**: [v0.5.0 Release Notes](https://github.com/dncnwtts/map2fig/releases/tag/v0.5.0)
**Full Documentation**: See [docs/](docs/) for detailed guides and benchmarking results
- `clap` 4.5.x - Command-line argument parsing
## Breaking Changes
None. This is a fully backward-compatible release.
## Installation
```bash
cargo install map2fig@0.5.0
```
Or build from source:
```bash
git clone https://github.com/dncnwtts/map2fig.git
cd map2fig && git checkout v0.5.0
cargo build --release
```
## Usage Examples
```bash
# Basic Mollweide projection
map2fig -f cosmoglobe.fits -o map.pdf
# With custom scaling and colormap
map2fig -f data.fits --log --min 1e-6 --max 1e-3 -c plasma --gamma 0.8
# Hammer projection with symlog scaling
map2fig -f data.fits --hammer --symlog
# Gnomonic projection with native resolution
map2fig -f data.fits --gnomonic --native --width 2000
```
## Documentation
See [docs/optimization/](docs/optimization/) for detailed performance analysis including:
- Tier 1-3 implementation details
- Benchmarking methodology and results
- Performance ceiling analysis
- Memory profiling data
## Contributors
- Duncan Watts (@dncnwtts) - Core development and optimization
## License
MIT
---
**Next Steps:** The next major optimization target is tile-based parallelization for multi-core speedup on 4+ core systems. GPU acceleration via CUDA is also planned for maximum speedup potential.