# framealloc Advanced Guide
Deep dive into framealloc internals and advanced techniques (20-100 hours experience).
## Table of Contents
1. [Internal Architecture](#internal-architecture)
2. [Custom Allocators](#custom-allocators)
3. [Memory Layout Optimization](#memory-layout-optimization)
4. [Advanced Threading](#advanced-threading)
5. [Instrumentation and Debugging](#instrumentation-and-debugging)
6. [Integration Patterns](#integration-patterns)
7. [Performance Profiling](#performance-profiling)
## Internal Architecture
### Allocator Hierarchy
framealloc uses a three-tier architecture:
```
SmartAlloc
├── GlobalState (shared, atomic)
│ ├── PoolManager (thread-safe pools)
│ ├── BudgetManager (limits and tracking)
│ └── Statistics (global metrics)
└── ThreadLocalState (per-thread, lock-free)
├── FrameArena (bump allocator)
├── LocalPool (thread-local cache)
└── FrameMetrics (per-thread stats)
```
### Frame Arena Internals
The frame arena is a bump allocator with chunked growth:
```rust
struct FrameArena {
chunks: Vec<Chunk>,
current: *mut u8,
end: *mut u8,
total_allocated: usize,
}
struct Chunk {
ptr: NonNull<u8>,
size: usize,
// Backing allocation (pool or system)
}
```
Growth strategy:
1. Start with 64KB chunk
2. Double size up to 1MB
3. Fixed 1MB chunks thereafter
4. Chunks returned to pool at frame end
### Pool Management
Pools use size classes and per-thread caches:
```rust
struct PoolManager {
// Global pools for fallback
global_pools: [Mutex<FreeList>; NUM_SIZE_CLASSES],
// Per-thread caches
thread_caches: ThreadLocal<Cache>,
}
struct Cache {
size_class: usize,
local: Vec<NonNull<u8>>,
limit: usize,
}
```
Size classes are powers of two from 8 bytes to 4KB.
## Custom Allocators
### Implementing a Custom Backend
```rust
use framealloc::{AllocatorBackend, AllocationResult};
struct CustomBackend {
// Your custom state
}
impl AllocatorBackend for CustomBackend {
fn allocate(&mut self, layout: Layout) -> AllocationResult {
// Implement your allocation strategy
if layout.size() <= 4096 {
// Use custom allocator
AllocationResult::Ok(ptr)
} else {
// Fall back to system
AllocationResult::Fallback
}
}
fn deallocate(&mut self, ptr: NonNull<u8>, layout: Layout) {
// Implement deallocation
}
}
// Use with SmartAlloc
let config = AllocConfig::default()
.with_backend(Box::new(CustomBackend::new()));
let alloc = SmartAlloc::new(config);
```
### Custom Memory Sources
```rust
struct MmapBackend {
mappings: Vec<Mmap>,
}
impl MmapBackend {
fn new() -> Self {
Self {
mappings: Vec::new(),
}
}
fn reserve(&mut self, size: usize) -> Result<NonNull<u8>, Error> {
let mmap = unsafe {
Mmap::map_anon(size)?
};
let ptr = mmap.as_ptr() as *mut u8;
self.mappings.push(mmap);
Ok(NonNull::new(ptr).unwrap())
}
}
```
### Arena Customization
```rust
struct CustomArena {
allocator: System,
chunk_size: usize,
alignment: usize,
}
impl CustomArena {
fn new(chunk_size: usize, alignment: usize) -> Self {
Self {
allocator: System,
chunk_size,
alignment,
}
}
fn allocate_chunk(&mut self) -> Result<NonNull<u8>, Error> {
let layout = Layout::from_size_align(
self.chunk_size,
self.alignment,
)?;
unsafe {
let ptr = self.allocator.alloc(layout)?;
Ok(NonNull::new_unchecked(ptr))
}
}
}
```
## Memory Layout Optimization
### Cache-Line Alignment
```rust
#[repr(align(64))] // Cache line size
struct CacheAligned<T> {
data: T,
_padding: [u8; 64],
}
// Usage in frame allocation
alloc.begin_frame();
let aligned_data = alloc.frame_alloc::<CacheAligned<ParticleData>>();
alloc.end_frame();
```
### Structure of Arrays (SoA)
```rust
// Instead of Array of Structures (AoS):
struct Particle {
position: Vector3,
velocity: Vector3,
color: Color,
mass: f32,
}
// Use Structure of Arrays (SoA):
struct Particles {
positions: Vec<Vector3>,
velocities: Vec<Vector3>,
colors: Vec<Color>,
masses: Vec<f32>,
}
impl Particles {
fn new(count: usize, alloc: &SmartAlloc) -> Self {
Self {
positions: alloc.frame_vec_with_capacity(count),
velocities: alloc.frame_vec_with_capacity(count),
colors: alloc.frame_vec_with_capacity(count),
masses: alloc.frame_vec_with_capacity(count),
}
}
}
```
### Batch Processing
```rust
struct BatchProcessor<T> {
batch_size: usize,
batches: Vec<Batch<T>>,
}
struct Batch<T> {
items: *mut T,
count: usize,
}
impl<T> BatchProcessor<T> {
fn process_all<F>(&mut self, mut f: F)
where
F: FnMut(&mut [T]),
{
for batch in &mut self.batches {
unsafe {
f(std::slice::from_raw_parts_mut(
batch.items,
batch.count,
));
}
}
}
}
```
## Advanced Threading
### Work-Stealing Queue
```rust
struct WorkStealingQueue<T> {
local: VecDeque<T>,
stolen: Arc<Mutex<VecDeque<T>>>,
}
impl<T> WorkStealingQueue<T> {
fn push_local(&mut self, item: T) {
self.local.push_back(item);
}
fn pop_local(&mut self) -> Option<T> {
self.local.pop_front()
}
fn steal(&self) -> Option<T> {
let mut stolen = self.stolen.lock().unwrap();
stolen.pop_front()
}
}
```
### NUMA-Aware Allocation
```rust
struct NumaAwareAlloc {
nodes: Vec<SmartAlloc>,
current_node: usize,
}
impl NumaAwareAlloc {
fn new() -> Result<Self, Error> {
let mut nodes = Vec::new();
for node in 0..numa::num_configured_cpus()? {
let config = AllocConfig::default()
.with_numa_node(node);
nodes.push(SmartAlloc::new(config));
}
Ok(Self {
nodes,
current_node: 0,
})
}
fn allocate_on_node<T>(&mut self, node: usize) -> FrameBox<T> {
self.nodes[node].frame_box()
}
}
```
### Lock-Free Statistics
```rust
use std::sync::atomic::{AtomicU64, Ordering};
struct LockFreeStats {
allocations: AtomicU64,
deallocations: AtomicU64,
bytes_allocated: AtomicU64,
peak_usage: AtomicU64,
}
impl LockFreeStats {
fn record_allocation(&self, size: usize) {
self.allocations.fetch_add(1, Ordering::Relaxed);
self.bytes_allocated.fetch_add(size, Ordering::Relaxed);
// Update peak (race condition acceptable for stats)
let current = self.bytes_allocated.load(Ordering::Relaxed);
loop {
let peak = self.peak_usage.load(Ordering::Relaxed);
if current <= peak {
break;
}
if self.peak_usage.compare_exchange_weak(
peak,
current,
Ordering::Relaxed,
Ordering::Relaxed
).is_ok() {
break;
}
}
}
}
```
## Instrumentation and Debugging
### Memory Tracing
```rust
#[cfg(feature = "debug")]
struct MemoryTracer {
allocations: HashMap<*const u8, AllocationInfo>,
stack_traces: bool,
}
#[derive(Debug)]
struct AllocationInfo {
size: usize,
backtrace: Backtrace,
thread: ThreadId,
timestamp: Instant,
}
impl MemoryTracer {
fn trace_allocation(&mut self, ptr: *const u8, size: usize) {
if self.stack_traces {
let info = AllocationInfo {
size,
backtrace: Backtrace::new(),
thread: thread::current().id(),
timestamp: Instant::now(),
};
self.allocations.insert(ptr, info);
}
}
fn find_leaks(&self) -> Vec<&AllocationInfo> {
self.allocations.values()
.filter(|info| info.age() > Duration::from_secs(60))
.collect()
}
}
```
### Memory Poisoning
```rust
#[cfg(feature = "debug")]
struct PoisonedMemory<T> {
data: MaybeUninit<T>,
poison: u64,
}
impl<T> PoisonedMemory<T> {
fn new(value: T) -> Self {
Self {
data: MaybeUninit::new(value),
poison: 0xDEADBEEFCAFEBABE,
}
}
fn get(&self) -> &T {
assert_eq!(self.poison, 0xDEADBEEFCAFEBABE);
unsafe { self.data.assume_init_ref() }
}
fn get_mut(&mut self) -> &mut T {
assert_eq!(self.poison, 0xDEADBEEFCAFEBABE);
unsafe { self.data.assume_init_mut() }
}
}
```
### Allocation Guard
```rust
struct AllocationGuard<'a, T> {
data: *mut T,
allocator: &'a SmartAlloc,
magic: u64,
}
const MAGIC: u64 = 0xFRA_ME_AL_LOC_ATION;
impl<'a, T> AllocationGuard<'a, T> {
fn new(allocator: &'a SmartAlloc, data: *mut T) -> Self {
// Write canaries
unsafe {
let canary = MAGIC as *mut u8;
ptr::write(data.offset(-1) as *mut u64, MAGIC);
ptr::write(data.offset(1) as *mut u64, MAGIC);
}
Self {
data,
allocator,
magic: MAGIC,
}
}
}
impl<'a, T> Drop for AllocationGuard<'a, T> {
fn drop(&mut self) {
// Check canaries
unsafe {
let start_canary = ptr::read(self.data.offset(-1) as *const u64);
let end_canary = ptr::read(self.data.offset(1) as *const u64);
assert_eq!(start_canary, MAGIC, "Buffer underflow detected");
assert_eq!(end_canary, MAGIC, "Buffer overflow detected");
}
}
}
```
## Integration Patterns
### ECS Integration
```rust
trait FrameallocComponent {
fn allocate_frame(alloc: &SmartAlloc) -> Self;
}
#[derive(Component)]
struct Transform {
position: Vector3,
rotation: Quaternion,
}
impl FrameallocComponent for Transform {
fn allocate_frame(alloc: &SmartAlloc) -> Self {
Self {
position: *alloc.frame_alloc(),
rotation: *alloc.frame_alloc(),
}
}
}
// Usage in system
fn update_transforms_system(
mut query: Query<&mut Transform>,
alloc: Res<SmartAlloc>,
) {
alloc.begin_frame();
for transform in &mut query {
*transform = Transform::allocate_frame(&alloc);
}
alloc.end_frame();
}
```
### Renderer Integration
```rust
struct FrameRenderer {
command_buffer: FrameBox<CommandBuffer>,
uniform_buffers: HashMap<String, FrameBox<UniformBuffer>>,
vertex_buffers: Vec<FrameBox<VertexBuffer>>,
}
impl FrameRenderer {
fn new(alloc: &SmartAlloc) -> Self {
Self {
command_buffer: alloc.frame_box(CommandBuffer::new()),
uniform_buffers: HashMap::new(),
vertex_buffers: Vec::new(),
}
}
fn begin_frame(&mut self, alloc: &SmartAlloc) {
alloc.begin_frame();
// Reset for new frame
self.command_buffer = alloc.frame_box(CommandBuffer::new());
self.uniform_buffers.clear();
self.vertex_buffers.clear();
}
fn end_frame(self, alloc: &SmartAlloc) {
// Submit all commands
self.command_buffer.submit();
alloc.end_frame();
// Everything automatically freed
}
}
```
### Physics Integration
```rust
struct PhysicsFrame {
contacts: FrameBox<[Contact]>,
manifolds: FrameBox<[Manifold]>,
impulses: FrameBox<[Impulse]>,
query_results: FrameBox<[RaycastHit]>,
}
impl PhysicsFrame {
fn new(alloc: &SmartAlloc, max_contacts: usize) -> Self {
Self {
contacts: alloc.frame_slice(max_contacts),
manifolds: alloc.frame_slice(max_contacts / 2),
impulses: alloc.frame_slice(max_contacts),
query_results: alloc.frame_slice(1000),
}
}
}
```
## Performance Profiling
### Custom Metrics
```rust
struct PerformanceMetrics {
frame_times: VecDeque<Duration>,
allocation_sizes: VecDeque<usize>,
allocation_counts: VecDeque<usize>,
peak_memory: usize,
}
impl PerformanceMetrics {
fn record_frame(&mut self, frame_time: Duration, alloc_stats: &FrameStats) {
self.frame_times.push_back(frame_time);
self.allocation_sizes.push_back(alloc_stats.bytes_allocated);
self.allocation_counts.push_back(alloc_stats.allocation_count);
// Keep only last 60 seconds
if self.frame_times.len() > 60 {
self.frame_times.pop_front();
self.allocation_sizes.pop_front();
self.allocation_counts.pop_front();
}
self.peak_memory = self.peak_memory.max(alloc_stats.bytes_allocated);
}
fn generate_report(&self) -> PerformanceReport {
PerformanceReport {
avg_frame_time: self.frame_times.iter().sum::<Duration>() / self.frame_times.len() as u32,
avg_allocations: self.allocation_counts.iter().sum::<usize>() / self.allocation_counts.len(),
peak_memory: self.peak_memory,
allocation_efficiency: self.calculate_efficiency(),
}
}
}
```
### Hot Path Analysis
```rust
#[cfg(feature = "profiling")]
struct HotPathAnalyzer {
allocation_sites: HashMap<*mut u8, AllocationSite>,
call_stack: Vec<*mut u8>,
}
#[derive(Debug)]
struct AllocationSite {
address: *mut u8,
size: usize,
call_stack: Vec<usize>,
frequency: usize,
}
impl HotPathAnalyzer {
fn record_allocation(&mut self, ptr: *mut u8, size: usize) {
let site = AllocationSite {
address: ptr,
size,
call_stack: self.capture_call_stack(),
frequency: 0,
};
self.allocation_sites.insert(ptr, site);
}
fn find_hot_spots(&self, threshold: f64) -> Vec<&AllocationSite> {
let total: usize = self.allocation_sites.values()
.map(|s| s.frequency)
.sum();
self.allocation_sites.values()
.filter(|s| s.frequency as f64 / total as f64 > threshold)
.collect()
}
}
```
### Memory Bandwidth Analysis
```rust
struct BandwidthAnalyzer {
reads: AtomicU64,
writes: AtomicU64,
start_time: Instant,
}
impl BandwidthAnalyzer {
fn record_read(&self, bytes: usize) {
self.reads.fetch_add(bytes as u64, Ordering::Relaxed);
}
fn record_write(&self, bytes: usize) {
self.writes.fetch_add(bytes as u64, Ordering::Relaxed);
}
fn get_bandwidth(&self) -> (f64, f64) {
let elapsed = self.start_time.elapsed().as_secs_f64();
let read_mb = self.reads.load(Ordering::Relaxed) as f64 / (1024.0 * 1024.0);
let write_mb = self.writes.load(Ordering::Relaxed) as f64 / (1024.0 * 1024.0);
(read_mb / elapsed, write_mb / elapsed)
}
}
```
## Best Practices for Advanced Users
1. **Profile before optimizing** - Use built-in metrics
2. **Consider NUMA topology** - Allocate close to where used
3. **Align to cache lines** - For hot data structures
4. **Use batch APIs** - For many small allocations
5. **Monitor fragmentation** - Especially with pools
6. **Implement custom backends** - For special hardware
7. **Use debug features** - During development
8. **Measure real impact** - Don't optimize prematurely
## Further Reading
- [Performance Guide](performance.md) - Detailed optimization
- [Technical Documentation](../TECHNICAL.md) - Implementation details
- [Source Code](https://github.com/YelenaTor/framealloc) - Reference implementation