fastalloc 1.5.0

# Performance Tuning Guide

This guide helps you get the best performance from fastalloc.

## Choosing the Right Pool Type

### FixedPool
**Use when:**
- You know the maximum number of objects needed
- You want the absolute fastest allocation/deallocation
- You need predictable, zero-fragmentation behavior
- Memory constraints are known upfront

**Performance:**
- Allocation: 10-20ns
- Deallocation: 5-10ns
- Zero fragmentation
- Best cache locality

```rust
let pool = FixedPool::<Entity>::new(10000).unwrap();
```

### GrowingPool
**Use when:**
- Object count varies significantly
- You want to start small and grow on demand
- You need to balance memory usage vs. performance
- You have a reasonable maximum capacity

**Performance:**
- Allocation: 15-50ns (spikes during growth)
- Deallocation: 10-15ns
- Minimal fragmentation with good strategies
- Slightly reduced cache locality

```rust
let config = PoolConfig::builder()
    .capacity(100)
    .max_capacity(Some(10000))
    .growth_strategy(GrowthStrategy::Exponential { factor: 2.0 })
    .build()
    .unwrap();
let pool = GrowingPool::with_config(config).unwrap();
```

### ThreadLocalPool
**Use when:**
- Each thread needs its own pool
- You want zero synchronization overhead
- Objects don't need to cross thread boundaries
- Single-threaded performance is critical

**Performance:**
- Same as FixedPool (10-20ns allocation)
- Zero lock contention
- Best for thread-affine workloads

```rust
let pool = ThreadLocalPool::<Data>::new(1000).unwrap();
```

### ThreadSafePool
**Use when:**
- Multiple threads need to share a pool
- Allocation rate per thread is moderate
- You need simple concurrent access

**Performance:**
- Allocation: 50-100ns (with moderate contention)
- Higher under contention
- Use parking_lot feature for 20-30% improvement

```rust
let pool = Arc::new(ThreadSafePool::<Item>::new(5000).unwrap());
```

## Capacity Planning

### Initial Capacity
Choose based on typical workload:

```rust
// Gaming: spawn waves
let pool = FixedPool::new(100).unwrap(); // per-wave capacity

// Server: concurrent connections
let pool = ThreadSafePool::new(1000).unwrap(); // expected load

// Data processing: batch size
let pool = FixedPool::new(batch_size * 2).unwrap(); // double-buffering
```

### Growth Strategy

**Linear:** Predictable memory growth
```rust
GrowthStrategy::Linear { amount: 100 }
// Good for: steady load increases
```

**Exponential:** Fast adaptation to demand
```rust
GrowthStrategy::Exponential { factor: 2.0 }
// Good for: bursty workloads, unknown peaks
```

**Custom:** Application-specific logic
```rust
GrowthStrategy::Custom {
    compute: Box::new(|current| {
        // Grow by 50%, minimum 10
        std::cmp::max(10, current / 2)
    })
}
```

## Memory Alignment

### Cache-Line Alignment
Prevent false sharing in concurrent scenarios:

```rust
PoolConfig::builder()
    .capacity(1000)
    .alignment(64) // Typical cache line size
    .build()
    .unwrap();
```

**When to use:**
- Thread-safe pools with high contention
- Objects accessed by multiple threads
- Performance-critical data structures

### SIMD Alignment
For vectorized operations:

```rust
PoolConfig::builder()
    .capacity(1000)
    .alignment(32) // AVX alignment
    .build()
    .unwrap();
```

## Initialization Strategies

### Lazy (Default)
Initialize objects when first allocated:

```rust
// Implicit - default behavior
let pool = FixedPool::new(1000).unwrap();
```

**Pros:**
- Fast pool creation
- No wasted initialization

**Cons:**
- First allocation slightly slower

### Eager
Pre-initialize all objects:

```rust
PoolConfig::builder()
    .capacity(1000)
    .pre_initialize(true)
    .initializer(|| MyType::default())
    .build()
    .unwrap();
```

**Pros:**
- Consistent allocation speed
- Front-load initialization cost

**Cons:**
- Slower pool creation
- Uses more memory upfront

### Custom with Reset
Reuse and reset objects:

```rust
PoolConfig::builder()
    .capacity(1000)
    .reset_fn(
        || Vec::with_capacity(1024),
        |v| {
            v.clear();
            // Keep capacity for reuse
        }
    )
    .build()
    .unwrap();
```

**Best for:**
- Expensive-to-create objects
- Objects with reusable resources
- Minimizing allocations within allocations

## Benchmarking Your Workload

### Use Criterion
```rust
use criterion::{black_box, criterion_group, criterion_main, Criterion};
use fastalloc::FixedPool;

fn benchmark_allocation(c: &mut Criterion) {
    let pool = FixedPool::<MyType>::new(1000).unwrap();
    
    c.bench_function("my_allocation", |b| {
        b.iter(|| {
            let handle = pool.allocate(black_box(MyType::new())).unwrap();
            black_box(&handle);
        });
    });
}

criterion_group!(benches, benchmark_allocation);
criterion_main!(benches);
```

### Measure Pool Overhead
Compare against standard allocation:

```rust
// Pool version
let pool = FixedPool::new(1000).unwrap();
let start = Instant::now();
for i in 0..1000 {
    let h = pool.allocate(i).unwrap();
    black_box(&h);
}
let pool_time = start.elapsed();

// Box version
let start = Instant::now();
for i in 0..1000 {
    let b = Box::new(i);
    black_box(&b);
}
let box_time = start.elapsed();

println!("Pool: {:?}, Box: {:?}, Speedup: {:.2}x",
         pool_time, box_time, box_time.as_nanos() as f64 / pool_time.as_nanos() as f64);
```

## Common Performance Issues

### Issue: Allocation Failures
```rust
// Bad: pool too small, constant failures
let pool = FixedPool::new(10).unwrap();
for i in 0..100 {
    let _ = pool.allocate(i); // Fails after 10
}

// Good: size appropriately
let pool = FixedPool::new(100).unwrap();
```

### Issue: Excessive Locking
```rust
// Bad: shared pool with high contention
let pool = Arc::new(ThreadSafePool::new(100).unwrap());
// Many threads competing

// Good: thread-local pools
thread_local! {
    static POOL: ThreadLocalPool<T> = ThreadLocalPool::new(100).unwrap();
}
```

### Issue: Frequent Growth
```rust
// Bad: starts too small, grows constantly
GrowthStrategy::Linear { amount: 1 }

// Good: reasonable initial size and growth
PoolConfig::builder()
    .capacity(1000)
    .growth_strategy(GrowthStrategy::Exponential { factor: 1.5 })
```

### Issue: Memory Waste
```rust
// Bad: huge pool, rarely used
let pool = FixedPool::new(1_000_000).unwrap();

// Good: growing pool with reasonable max
PoolConfig::builder()
    .capacity(100)
    .max_capacity(Some(10_000))
    .growth_strategy(GrowthStrategy::Exponential { factor: 2.0 })
```

## Statistics-Guided Optimization

Enable the `stats` feature:

```rust
#[cfg(feature = "stats")]
{
    let stats = pool.statistics();
    
    println!("Peak usage: {} / {}", stats.peak_usage, stats.capacity);
    println!("Utilization: {:.1}%", stats.utilization_rate());
    println!("Hit rate: {:.2}%", stats.hit_rate() * 100.0);
    
    if stats.peak_usage < stats.capacity * 0.5 {
        println!("Consider reducing capacity");
    }
    
    if stats.hit_rate() < 0.95 {
        println!("Pool exhaustion detected, increase capacity");
    }
}
```

## Platform-Specific Tips

### Linux
- Use `parking_lot` feature for faster mutexes
- Consider huge pages for very large pools

### Windows
- Default std::Mutex is reasonably fast
- Monitor pool usage with performance counters

### Embedded / no_std
- Use FixedPool exclusively
- Disable all optional features
- Pre-initialize if possible to avoid runtime allocation

## Quick Wins Checklist

- [ ] Choose FixedPool if capacity is known
- [ ] Use ThreadLocalPool for single-threaded scenarios
- [ ] Enable `parking_lot` feature for thread-safe pools
- [ ] Set initial capacity based on typical usage
- [ ] Use exponential growth for unknown workloads
- [ ] Add cache-line alignment for shared data
- [ ] Implement reset functions for expensive types
- [ ] Measure with `stats` feature in development
- [ ] Benchmark your specific workload
- [ ] Profile to verify pool is the bottleneck

## Expected Performance Targets

### Fixed Pool
- Allocation: < 20ns
- Deallocation: < 10ns
- Memory overhead: < 2%

### Growing Pool  
- Allocation: < 50ns (steady state)
- Growth: < 100μs (for reasonable sizes)
- Memory overhead: < 5%

### Thread-Safe Pool
- Allocation: < 100ns (low contention)
- < 500ns (high contention)

If you're not meeting these targets, profile and investigate!