# Performance Guide
Comprehensive guide to ruviz performance characteristics, optimization strategies, and best practices based on verified benchmarking results.
## Verified Performance Metrics
**Based on Week 6 comprehensive benchmarking** (2025-10-07)
- **Hardware**: 16-core system
- **Benchmark Tool**: Criterion 0.5
- **Rust Edition**: 2024
### Rendering Performance Summary
| Line plot | 100K points | **34.6ms** | 2.9x faster ✅ |
| Histogram | 1M points | **87ms** | 5.7x faster ✅ |
| Box plot | 100K points | **28ms** | 7.1x faster ✅ |
| Multi-series | 50K total | **28.7ms** | 5.2x faster ✅ |
**Throughput**: 3.17 million elements/second
### Auto-Optimization Performance
Backend selection decision time: **< 142µs** (worst case)
| 100 points | 218 nanoseconds | ✅ Excellent |
| 1K points | 1.5 microseconds | ✅ Excellent |
| 100K points | 142 microseconds | ✅ Excellent |
**Conclusion**: Zero user-facing overhead for auto-optimization.
### Small Dataset Performance
**Note**: 1K points take ~250ms on first plot (cold start)
**Time Breakdown**:
- Font initialization: 50-100ms
- Canvas setup: 20-50ms
- Rendering pipeline: 50-100ms
- File I/O: 30-50ms
- **Actual plotting: < 10ms** ✅
**Key Insight**: This is not algorithmic slowness - it's fixed overhead. The system excels at large datasets where this overhead is amortized.
## Performance Best Practices
### 1. Choose the Right Output Path
For large datasets, start with the output path you actually need:
```rust
use ruviz::prelude::*;
// PNG export: save() already applies its own large-dataset thresholds
Plot::new()
.line(&x, &y)
.save("plot.png")?;
```
**Why?**
- Decision time: < 142µs (negligible)
- `save()` already switches to its large-dataset export paths internally
- `render()` is the in-memory path that can use the parallel renderer
**Stored backend metadata (`auto_optimize()`)**:
- < 1K points → Skia
- 1,000 to 99,999 points → Parallel when built with the `parallel` feature, otherwise Skia
- 100,000+ points → GPU when built with the `gpu` feature, otherwise DataShader
### 2. Choose the Right Plot Type
| Trends | Line plot | Fast, clear visualization |
| Distribution | Histogram | Optimized binning algorithm |
| Statistics | Box plot | Efficient quartile calculation |
| Comparison | Bar chart | Simple, fast rendering |
| Correlation | Scatter plot | Good for < 50K points |
**Example**: For 1M data points showing distribution:
```rust
// Good - histogram with automatic binning
Plot::new()
.histogram(&data, None) // 87ms for 1M points ✅
.save("distribution.png")?;
// Slower - scatter plot for same purpose
Plot::new()
.scatter(&x, &y) // Would be much slower
.save("distribution.png")?;
```
### 3. Understand Cold Start Overhead
**First plot in application**: ~250ms
- Font initialization: one-time cost
- Canvas setup: required for quality output
- Pipeline init: necessary setup
- **This is normal and expected!**
**Subsequent plots**: Much faster (fonts cached)
**Recommendation**:
- For applications creating multiple plots, cold start amortizes across all plots
- For single-plot scripts, 250ms is acceptable for publication-quality output
- Future batch API will reduce to < 10ms per plot after warmup
### 4. Large Dataset Optimization
For datasets with 100,000 points or more:
```rust
// Large PNG export
let x: Vec<f64> = (0..100_000).map(|i| i as f64).collect();
Plot::new()
.line(&x, &y)
.save("large_plot.png")?;
```
**Expected performance**:
- 100K points: ~35ms
- 1M points: ~87ms (histogram)
**Tips**:
- Consider DataShader for > 1M points
- Use histogram for distributions (faster than scatter)
- Use `render()` for in-memory images and `save()` for PNG export
### 5. Memory Efficiency
ruviz memory usage: typically < 2x data size
**Example**:
- 100K points (1.6MB data) → < 3.2MB peak memory
- Efficient buffer pooling
- Automatic cleanup on Plot drop
**For memory-constrained environments**:
```rust
// Process data in batches if needed
for batch in data.chunks(50_000) {
Plot::new()
.line(batch_x, batch_y)
.save(&format!("batch_{}.png", i))?;
}
```
### 6. Multi-Series Plots
Multi-series plotting scales linearly:
```rust
// Efficient multi-series rendering
let plot = Plot::new()
.line(&x, &y1) // Series 1
.line(&x, &y2) // Series 2
.line(&x, &y3) // Series 3
.line(&x, &y4) // Series 4
.line(&x, &y5); // Series 5
plot.save("multi_series.png")?;
```
**Performance**: 5 series × 10K points = 28.7ms total
**Tips**:
- Chain series methods before calling `.save()`
- Use distinct colors/styles for clarity
- Consider subplots for > 5 series
### 7. DPI and Output Quality
Higher DPI = larger files and longer rendering:
| 96 | 1.0x (baseline) | Screen display |
| 150 | ~2.5x | Presentations |
| 300 | ~5.2x | IEEE publications |
| 600 | ~17.9x | High-res printing |
```rust
// For publications, use 300 DPI
Plot::new()
.line(&x, &y)
.dpi(300) // IEEE standard
.save("publication_plot.png")?;
```
**Rendering time impact**: Minimal (< 10% increase for 2x DPI)
## Benchmarking Your Application
To benchmark ruviz in your application:
```rust
use std::time::Instant;
use ruviz::prelude::*;
fn main() -> Result<()> {
let x: Vec<f64> = (0..100_000).map(|i| i as f64).collect();
let y: Vec<f64> = x.iter().map(|v| v.sin()).collect();
let start = Instant::now();
Plot::new()
.line(&x, &y)
.save("benchmark_plot.png")?;
let duration = start.elapsed();
println!("Plot rendered in {:?}", duration);
println!("Throughput: {:.2} Melem/s",
(x.len() as f64 / duration.as_secs_f64()) / 1_000_000.0);
Ok(())
}
```
**Expected results** (100K points):
- Cold start: ~250-280ms
- Warm (2nd+ plot): ~30-40ms
- Throughput: 2.5-3.5 Melem/s
## Performance Debugging
If performance is slower than expected:
### Step 1: Verify the Path You Are Measuring
```rust
// In-memory rendering path
let _image = Plot::new()
.line(&x, &y)
.render()?;
```
### Step 2: Measure with Warmup
```rust
// Warmup run (ignore timing)
Plot::new().line(&x_small, &y_small).save("warmup.png")?;
// Actual measurement
let start = Instant::now();
Plot::new().line(&x, &y).save("actual.png")?;
let duration = start.elapsed();
println!("Warm performance: {:?}", duration);
```
### Step 3: Check Dataset Size
```rust
println!("Dataset size: {} points", x.len());
println!("Expected time: {} ms",
if x.len() < 1_000 { 250 }
else if x.len() < 100_000 { 50 }
else { 35 });
```
### Step 4: Profile I/O
```rust
// Measure file write separately
let image = Plot::new().line(&x, &y).render()?;
let write_start = Instant::now();
image::save_buffer("plot.png", &image.pixels,
image.width, image.height,
image::ColorType::Rgba8)?;
let write_time = write_start.elapsed();
println!("I/O time: {:?}", write_time);
```
### Step 5: Report Issue
If performance is still unexpected, report with:
- System specifications (CPU cores, RAM)
- Dataset size and characteristics
- Timing measurements
- Whether you used `render()` or `save()`, and whether `.gpu(true)` was enabled
- Operating system
## Performance Expectations by Use Case
### Real-Time Visualization
**Goal**: < 50ms updates for interactive feel
**Strategy**:
```rust
// Cache Plot configuration, update data only
let mut current_data = initial_data;
loop {
update_data(&mut current_data);
let start = Instant::now();
Plot::new()
.line(&x, ¤t_data)
.save("realtime_plot.png")?;
if start.elapsed().as_millis() > 50 {
eprintln!("Warning: frame took {:?}", start.elapsed());
}
std::thread::sleep(Duration::from_millis(100));
}
```
**Achievable**: 100K points in ~35ms ✅
### Batch Processing
**Goal**: Process thousands of plots efficiently
**Strategy**:
```rust
// Single warmup, then batch processing
Plot::new().line(&[1.0], &[1.0]).save("warmup.png")?;
for (i, dataset) in datasets.iter().enumerate() {
Plot::new()
.line(&dataset.x, &dataset.y)
.save(&format!("plot_{:04}.png", i))?;
}
```
**Expected**: ~30-40ms per plot after warmup
### Scientific Publishing
**Goal**: High-quality output, correctness > speed
**Strategy**:
```rust
Plot::new()
.line(&x, &y)
.title("Figure 1: Experimental Results")
.xlabel("Time (s)")
.ylabel("Amplitude (V)")
.dpi(300) // IEEE standard
.save("figure1.png")?;
```
**Expected**: ~250-300ms (acceptable for quality)
### Web Services
**Goal**: Responsive plot generation for users
**Strategy**:
```rust
use ruviz::{core::plot::Image, prelude::*};
// Async endpoint example
async fn generate_plot(data: Vec<(f64, f64)>) -> Result<Image> {
let (x, y): (Vec<_>, Vec<_>) = data.into_iter().unzip();
Plot::new()
.line(&x, &y)
.render()
}
```
**Expected**: 30-50ms response time for 10K points, plus any external PNG encoding you add on top of `Image { width, height, pixels }`
## Future Performance Work
Planned optimizations (future releases):
### Batch Rendering API (v0.2)
```rust
// Future API - amortized overhead
let renderer = BatchRenderer::new()?; // One-time setup
for dataset in datasets {
renderer.plot_line(&dataset.x, &dataset.y,
&format!("plot_{}.png", dataset.id))?;
}
// Expected: < 10ms per plot after warmup
```
### Static Font Cache (v0.2)
Reduce cold start by 50-100ms:
- Persistent font system across plots
- System-wide font cache
- One-time initialization
### SIMD Marker Rendering (v0.3)
Faster scatter plots:
- Vectorized marker drawing
- 2-3x improvement for scatter plots
- Platform-optimized (SSE, AVX, NEON)
### GPU Acceleration (experimental today)
Available today behind the `gpu` feature and explicit `.gpu(true)`:
- GPU-assisted PNG export in `save()` for sufficiently large datasets
- Automatic CPU fallback if GPU initialization fails
- `render()` continues to use the in-memory CPU/DataShader/parallel paths today
Future GPU work:
- Broader plot-type coverage
- Lower-latency interactive workflows
- Browser/WebGPU deployment
## Comparison with Other Libraries
**Note**: Benchmarks are approximate and workload-dependent. Always benchmark your specific use case.
### 100K Point Line Plot
| **ruviz** | **34.6ms** | Rust | Auto-optimized |
| matplotlib | ~500ms | Python | Multiple |
| plotters | ~100ms | Rust | Native CPU |
| plotly | ~200ms | Python | Browser (D3.js) |
| Chart.js | ~150ms | JavaScript | Browser (Canvas) |
**ruviz advantages**:
- Pure Rust performance
- Parallel rendering for large datasets
- Zero-overhead auto-optimization
- Publication-quality output
### 1M Point Histogram
| **ruviz** | **87ms** | Rust | Includes binning |
| matplotlib | ~1.2s | Python | Standard binning |
| plotly | ~800ms | Python | With Dash |
| ggplot2 | ~600ms | R | Statistical focus |
**ruviz advantages**:
- 5.7x faster than target
- Efficient statistical computation
- Optimized binning algorithm
## Summary
### Performance Highlights
✅ **Excellent large dataset performance** (2.9-5.7x faster than targets)
✅ **Zero-overhead auto-optimization** (< 142µs decision time)
✅ **3.17 Melem/s throughput** (sustained across benchmarks)
✅ **Sub-100ms for 100K points** (34.6ms actual)
✅ **Production-ready performance** (verified with comprehensive benchmarks)
### Optimization Guidelines
1. **Use `render()` for in-memory images and `save()` for PNG export**
2. **Accept 250ms cold start as normal** (amortizes for multiple plots)
3. **Choose appropriate plot types** (histogram > scatter for distributions)
4. **Use 300 DPI for publications** (minimal performance impact)
5. **Profile before optimizing** (measure actual performance first)
### When to Use ruviz
**Ideal for**:
- Scientific data visualization
- Real-time monitoring (100K points in 35ms)
- Batch plot generation
- Publication-quality output
- Performance-critical applications
**Consider alternatives if**:
- Need production-grade interactive plots
- Require extensive plot customization (ruviz prioritizes performance)
- Working with web-only applications (WASM support planned)
## Additional Resources
- **Benchmark Results**: See `docs/BENCHMARK_RESULTS.md` for detailed metrics
- **Troubleshooting**: See `docs/TROUBLESHOOTING.md` for common issues
- **Optimization Findings**: See `docs/OPTIMIZATION_FINDINGS.md` for architectural analysis
- **GitHub Issues**: https://github.com/Ameyanagi/ruviz/issues for performance questions
---
**Last Updated**: 2025-10-07 (Week 8: Quality Polish)
**Benchmark Version**: Week 6 comprehensive validation
**Status**: Production-ready, verified performance