zeropool 0.3.1

High-performance buffer pool with constant-time allocation, thread-safe operations, and 5x speedup over bytes crate
Documentation

ZeroPool

A high-performance buffer pool for Rust - Performance First

Crates.io Documentation License CI

Why ZeroPool?

ZeroPool is a high-performance buffer pool that prioritizes speed above all else. Unlike traditional buffer pools that trade performance for security features, ZeroPool delivers maximum performance:

  • Performance-first: No memory zeroing by default (1000-10000x faster than zeroing pools)
  • Safe Rust: No unsafe memory operations, only safe abstractions
  • High performance: Thread-local caching and smart allocation strategies minimize overhead
  • Auto-configured: Adapts to your CPU topology for optimal multi-threaded performance

Perfect for high-throughput applications where raw speed is the primary requirement.

Quick Start

use zeropool::BufferPool;

let pool = BufferPool::new();

// Get a buffer (high-performance, not zeroed by default)
let mut buffer = pool.get(1024 * 1024); // 1MB

// Use it for I/O or data processing
file.read(&mut buffer)?;

// Zero manually if needed for security
buffer.fill(0);

// Buffer automatically returned to pool when dropped

Key Features

High Performance โšก

  • Extreme speed: 1000-10000x faster than zeroing buffer pools
  • Thread-local caching: Lock-free fast path for 60-110ns allocation latency
  • Smart sharding: Minimal contention with power-of-2 shard count
  • Auto-configured: CPU-aware defaults (4-128 shards, 2-8 TLS cache size)
  • Configurable eviction: Choose between LIFO or CLOCK-Pro algorithms

Simple API ๐ŸŽฏ

  • Just get() and drop(): Buffers automatically return to the pool
  • Builder pattern: Easy customization when needed
  • Type-safe: Leverages Rust's ownership for automatic resource management

Architecture

Thread 1     Thread 2     Thread N
โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”  โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”  โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
โ”‚ TLS (4) โ”‚  โ”‚ TLS (4) โ”‚  โ”‚ TLS (4) โ”‚  โ† Lock-free (60-110ns)
โ””โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”˜  โ””โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”˜  โ””โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”˜
     โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜
                โ†“
       โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
       โ”‚ Sharded Pool   โ”‚              โ† Thread affinity
       โ”‚ [0][1]...[N]   โ”‚              โ† Minimal contention
       โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜

Fast path: Thread-local cache (lock-free, ~60-110ns) Slow path: Thread-affinity shard selection (better cache locality) Optimization: Power-of-2 shards enable bitwise AND instead of modulo

Performance

Cache Behavior Benchmarks

Pattern Metric Result
Ping-pong (LIFO) Time per operation 3.56 ยตs
Ping-pong (ClockPro) Time per operation 3.68 ยตs
Hot/cold buffers Time per operation 1.05 ยตs
Multi-size workload Time per operation 6.2 ยตs
TLS cache (2 bufs) Allocation latency 60.5 ns
TLS cache (4 bufs) Allocation latency 108 ns
TLS cache (8 bufs) Allocation latency 288 ns
Eviction pressure Time per operation 400 ns

Multi-threaded Scaling

Threads Time per 1000 ops Notes
1 44.7 ยตs Single-threaded baseline
4 141 ยตs Good scaling with TLS cache
8 282 ยตs Near-linear scaling
16 605 ยตs Still scales well at high concurrency

Performance Characteristics

  • Extreme speed: 1000-10000x faster than zeroing buffer pools
  • Constant latency: 60-110ns for TLS cache hits regardless of buffer size
  • Lock-free fast path: Thread-local cache eliminates contention
  • Scales linearly: Near-linear scaling up to 16+ threads

Run yourself:

cargo bench

Test System

  • CPU: Intel i9-10900K @ 3.7GHz (10 cores, 20 threads, 5.3GHz turbo)
  • RAM: 32GB DDR4
  • OS: Linux 6.17.0

Configuration

use zeropool::BufferPool;

let pool = BufferPool::builder()
    .tls_cache_size(8)               // Buffers per thread
    .min_buffer_size(512 * 1024)     // Keep buffers โ‰ฅ 512KB
    .max_buffers_per_shard(32)       // Max pooled buffers
    .num_shards(16)                  // Override auto-detection
    .build();

Defaults (auto-configured based on CPU count):

  • Shards: 4-128 (power-of-2, ~1 shard per 2 cores)
  • TLS cache: 2-8 buffers per thread
  • Min buffer size: 1MB
  • Max per shard: 16-64 buffers

Memory Pinning

Lock buffer memory in RAM to prevent swapping (performance optimization):

use zeropool::BufferPool;

let pool = BufferPool::builder()
    .pinned_memory(true)
    .build();

Useful for high-performance computing or real-time systems. May require elevated privileges on some systems. Falls back gracefully if pinning fails.

Eviction Policy

Choose between simple LIFO or intelligent CLOCK-Pro buffer eviction:

use zeropool::{BufferPool, EvictionPolicy};

let pool = BufferPool::builder()
    .eviction_policy(EvictionPolicy::ClockPro)  // Better cache locality (default)
    .build();

let pool_lifo = BufferPool::builder()
    .eviction_policy(EvictionPolicy::Lifo)     // Simple, lowest overhead
    .build();

CLOCK-Pro (default): Uses access counters to favor recently-used buffers, preventing cache thrashing in mixed-size workloads. ~8 bytes overhead per buffer.

LIFO: Simple last-in-first-out eviction. Minimal memory overhead, best for uniform buffer sizes.

How It Works

Thread-local caching (lock-free)

  • Lock-free access to recently used buffers
  • No atomic operations on fast path (60-110ns latency)
  • Zero cache-line bouncing

Thread-local shard affinity

  • Each thread consistently uses the same shard (cache locality)
  • shard = hash(thread_id) & (num_shards - 1) (no modulo)
  • Minimal lock contention + better CPU cache utilization
  • Auto-scales with CPU count

First-fit allocation

  • O(1) instead of O(n) best-fit
  • Perfect for predictable I/O buffer sizes

Performance-first memory management

  • Buffers are not zeroed by default for maximum performance (1000-10000x faster)
  • Users can manually zero buffers if information leakage prevention is required
  • Safe for performance-critical workloads where security is handled at higher layers

Thread Safety

BufferPool is Clone and thread-safe:

let pool = BufferPool::new();

for _ in 0..4 {
    let pool = pool.clone();
    std::thread::spawn(move || {
        let buf = pool.get(1024);
        // Each thread gets its own TLS cache
        // Buffer automatically returned when dropped
    });
}

Use Cases

High-Performance Applications

  • Data processing: ETL pipelines, log processing, analytics
  • Network servers: HTTP, gRPC, WebSocket servers with high throughput
  • File I/O: Async file loading with io_uring, tokio, async-std
  • LLM inference: Fast checkpoint loading and model serving
  • Real-time systems: Low-latency buffer management
  • Big data: High-throughput data streaming and processing

Real-World Example

Before ZeroPool, loading GPT-2 checkpoints took 200ms with 70% spent on buffer allocation. With ZeroPool: 53ms (3.8x faster) while delivering maximum performance without security overhead.

System Scaling

ZeroPool automatically adapts to your system:

System Cores TLS Cache Shards Buffers/Shard Total Capacity
Embedded 4 4 4 16 64 (~64MB)
Laptop 8 6 8 16 128 (~128MB)
Workstation 16 6 8 32 256 (~256MB)
Small Server 32 8 16 64 1024 (~1GB)
Large Server 64 8 32 64 2048 (~2GB)
Supercompute 128 8 64 64 4096 (~4GB)

Comparison with Alternatives

Feature ZeroPool bytes::BytesMut Lifeguard Sharded-Slab
Memory zeroing โŒ No (performance-first) โŒ No โŒ No โŒ No
Safe Rust โœ… 100% โš ๏ธ Some unsafe โš ๏ธ Some unsafe โš ๏ธ Heavy unsafe
Thread-safe โœ… Yes โŒ No โš ๏ธ Limited โœ… Yes
Lock-free path โœ… TLS cache โŒ No โŒ No โš ๏ธ Partial
Auto-configured โœ… CPU-aware โŒ Manual โŒ Manual โŒ Manual
Performance focus โœ… Primary โŒ No โŒ No โŒ No

ZeroPool is the fastest buffer pool available, designed purely for maximum performance while maintaining safety.

License

Dual licensed under Apache-2.0 or MIT.

Contributing

PRs welcome! Please include benchmarks for performance changes and ensure all tests pass:

cargo test
cargo bench
cargo fmt
cargo clippy

Changelog

See CHANGELOG.md for version history.

Credits

Built with โค๏ธ for the Rust community. Inspired by the need for high-performance buffer management in production systems.