buffer_sv2
buffer_sv2
handles memory management for Stratum V2 (Sv2) roles. It provides a memory-efficient
buffer pool that minimizes allocations and deallocations for high-throughput message frame
processing in Sv2 roles. Memory allocation overhead is minimized by reusing large buffers,
improving performance and reducing latency. The buffer pool tracks the usage of memory slices,
using shared state tracking to safely manage memory across multiple threads.
Main Components
- Buffer Trait: An interface for working with memory buffers. This trait has two implementations
(
BufferPool
andBufferFromSystemMemory
) that includes aWrite
trait to replacestd::io::Write
inno_std
environments. - BufferPool: A thread-safe pool of reusable memory buffers for high-throughput applications.
- BufferFromSystemMemory: Manages a dynamically growing buffer in system memory for applications where performance is not a concern.
- Slice: A contiguous block of memory, either preallocated or dynamically allocated.
Usage
To include this crate in your project, run:
This crate can be built with the following feature flags:
debug
: Provides additional tracking for debugging memory management issues.fuzz
: Enables support for fuzz testing.
Unsafe Code
There are four unsafe code blocks instances:
buffer_pool/mod.rs
:fn get_writable_(&mut self, len: usize, shared_state: u8, without_check: bool) -> &mut [u8] { .. }
in theimpl<T: Buffer> BufferPool<T>
slice.rs
:unsafe impl Send for Slice {}
fn as_mut(&mut self) -> &mut [u8] { .. }
in theimpl AsMut<[u8]> for Slice
fn as_ref(&mut self) -> &mut [u8] { .. }
in theimpl AsMut<[u8]> for Slice
Examples
This crate provides three examples demonstrating how the memory is managed:
-
Basic Usage Example: Creates a buffer pool, writes to it, and retrieves the data from it.
-
Buffer Pool Exhaustion Example: Demonstrates how data is added to a buffer pool and dynamically allocates directly to the heap once the buffer pool's capacity has been exhausted.
-
Variable Sized Messages Example: Writes messages of variable sizes to the buffer pool.
Buffer
Trait
The Buffer
trait is designed to work with the
codec_sv2
decoders, which operate by:
- Filling a buffer with the size of the protocol header being decoded.
- Parsing the filled bytes to compute the message length.
- Filling a buffer with the size of the message.
- Using the header and message to construct a
framing_sv2::framing::Frame
.
To fill the buffer, the codec_sv2
decoder must pass a reference of the buffer to a filler. To
construct a Frame
, the decoder must pass ownership of the buffer to the Frame
.
;
This get_writable
method returns a mutable reference to the buffer, starting at the current
length and ending at len
, and sets the buffer length to the previous length plus len
.
get_data_owned ;
This get_data_owned
method returns a Slice
that implements AsMut<[u8]>
and Send
.
The Buffer
trait is implemented for BufferFromSystemMemory
and BufferPool
. It includes a
Write
trait to replace std::io::Write
in no_std
environments.
BufferPoolFromSystemMemory
BufferFromSystemMemory
is a simple implementation of the Buffer
trait. Each time a new buffer is
needed, it creates a new Vec<u8>
.
get_writable(..)
returns mutable references to the inner vector.get_data_owned(..)
returns the inner vector.
BufferPool
While BufferFromSystemMemory
is sufficient for many cases, BufferPool
offers a more efficient
solution for high-performance applications, such as proxies and pools with thousands of connections.
When created, BufferPool
preallocates a user-defined capacity of bytes in the heap using a
Vec<u8>
. When get_data_owned(..)
is called, it creates a Slice
that contains a view into the
preallocated memory. BufferPool
guarantees that slices never overlap and maintains unique
ownership of each Slice
.
Slice
implements the Drop
, allowing the view into the preallocated memory to be reused upon
dropping.
Buffer Management and Allocation
BufferPool
is useful for working with sequentially processed buffers, such as filling a buffer,
retrieving it, and then reusing it as needed. BufferPool
optimizes for memory reuse by providing
pre-allocated memory that can be used in one of three modes:
- Back Mode: Default mode where allocations start from the back of the buffer.
- Front Mode: Used when slots at the back are full but memory can still be reused by moving to the front.
- Alloc Mode: Falls back to system memory allocation (
BufferFromSystemMemory
) when both back and front sections are full, providing additional capacity but with reduced performance.
BufferPool
can only be fragmented between the front and back and between back and end.
Fragmentation, Overflow, and Optimization
BufferPool
can allocate a maximum of 8
Slice
s (as it uses an AtomicU8
to track used and
freed slots) and up to the defined capacity in bytes. If all 8
slots are taken or there is no more
space in the preallocated memory, BufferPool
falls back to BufferFromSystemMemory
.
Typically, BufferPool
is used to process messages sequentially (decode, respond, decode). It is
optimized to check for any freed slots starting from the beginning, then reuse these before
considering further allocation. It is also optimized to drop all the slices and to drop the last
slice. It also efficiently handles scenarios where all slices are dropped or when the last slice is
released, reducing memory fragmentation.
The following cases illustrate typical memory usage patterns within BufferPool
:
- Slots fill from back to front, switching as each area reaches capacity.
- Pool resets upon full usage, then reuses back slots.
- After filling the back, front slots are used when they become available.
Below is a graphical representation of the most optimized cases. A number means that the slot is
taken, the minus symbol (-
) means the slot is free. There are 8
slots.
Case 1: Buffer pool exhaustion
-------- BACK MODE
1------- BACK MODE
12------ BACK MODE
123----- BACK MODE
1234---- BACK MODE
12345--- BACK MODE
123456-- BACK MODE
1234567- BACK MODE
12345678 BACK MODE (buffer is now full)
12345678 ALLOC MODE (new bytes being allocated in a new space in the heap)
12345678 ALLOC MODE (new bytes being allocated in a new space in the heap)
..... and so on
Case 2: Buffer pool reset to remain in back mode
-------- BACK MODE
1------- BACK MODE
12------ BACK MODE
123----- BACK MODE
1234---- BACK MODE
12345--- BACK MODE
123456-- BACK MODE
1234567- BACK MODE
12345678 BACK MODE (buffer is now full)
-------- RESET
9------- BACK MODE
9a------ BACK MODE
Case 3: Buffer pool switches from back to front to back modes
-------- BACK MODE
1------- BACK MODE
12------ BACK MODE
123----- BACK MODE
1234---- BACK MODE
12345--- BACK MODE
123456-- BACK MODE
1234567- BACK MODE
12345678 BACK MODE (buffer is now full)
--345678 Consume first two data bytes from the buffer
-9345678 SWITCH TO FRONT MODE
a9345678 FRONT MODE (buffer is now full)
a93456-- Consume last two data bytes from the buffer
a93456b- SWITCH TO BACK MODE
a93456bc BACK MODE (buffer is now full)
Benchmarks and Performance
To run benchmarks, execute:
cargo bench --features criterion
Benchmarks Comparisons
BufferPool
is benchmarked against BufferFromSystemMemory
and two additional structure for
reference: PPool
(a hashmap-based pool) and MaxEfficeincy
(a highly optimized but unrealistic
control implementation written such that the benchmarks do not panic and the compiler does not
complain). BufferPool
generally provides better performance and lower latency than PPool
and
BufferFromSystemMemory
.
Note: Both PPool
and MaxEfficeincy
are completely broken and are only useful as references
for the benchmarks.
BENCHES.md
Benchmarks
The BufferPool
always outperforms the PPool
(hashmap-based pool) and the solution without a
pool.
Executed for 2,000 samples:
* single thread with `BufferPool`: ---------------------------------- 7.5006 ms
* single thread with `BufferFromSystemMemory`: ---------------------- 10.274 ms
* single thread with `PPoll`: --------------------------------------- 32.593 ms
* single thread with `MaxEfficeincy`: ------------------------------- 1.2618 ms
* multi-thread with `BufferPool`: ---------------------------------- 34.660 ms
* multi-thread with `BufferFromSystemMemory`: ---------------------- 142.23 ms
* multi-thread with `PPoll`: --------------------------------------- 49.790 ms
* multi-thread with `MaxEfficeincy`: ------------------------------- 18.201 ms
* multi-thread 2 with `BufferPool`: ---------------------------------- 80.869 ms
* multi-thread 2 with `BufferFromSystemMemory`: ---------------------- 192.24 ms
* multi-thread 2 with `PPoll`: --------------------------------------- 101.75 ms
* multi-thread 2 with `MaxEfficeincy`: ------------------------------- 66.972 ms
Single Thread Benchmarks
If the buffer is not sent to another context BufferPool
, it is 1.4 times faster than no pool, 4.3
time faster than the PPool
, and 5.7 times slower than max efficiency.
Average times for 1,000 operations:
BufferPool
: 7.5 msBufferFromSystemMemory
: 10.27 msPPool
: 32.59 msMaxEfficiency
: 1.26 ms
for 0..1000:
add random bytes to the buffer
get the buffer
add random bytes to the buffer
get the buffer
drop the 2 buffer
Multi-Threaded Benchmarks (most similar to actual use case)
If the buffer is sent to other contexts, BufferPool
is 4 times faster than no pool, 0.6 times
faster than PPool
, and 1.8 times slower than max efficiency.
BufferPool
: 34.66 msBufferFromSystemMemory
: 142.23 msPPool
: 49.79 msMaxEfficiency
: 18.20 ms
for 0..1000:
add random bytes to the buffer
get the buffer
send the buffer to another thread -> wait 1 ms and then drop it
add random bytes to the buffer
get the buffer
send the buffer to another thread -> wait 1 ms and then drop it
Multi threads 2
for 0..1000:
add random bytes to the buffer
get the buffer
send the buffer to another thread -> wait 1 ms and then drop it
add random bytes to the buffer
get the buffer
send the buffer to another thread -> wait 1 ms and then drop it
wait for the 2 buffer to be dropped
Fuzz Testing
Install cargo-fuzz
with:
Run the fuzz tests:
The test must be run with -rss_limit_mb=5000000000
as this flag checks BufferPool
with
capacities from 0
to 2^32
.
BufferPool
is fuzz-tested to ensure memory reliability across different scenarios, including
delayed memory release and cross-thread access. The tests checks if slices created by BufferPool
still contain the same bytes contained at creation time after a random amount of time and after it
has been sent to other threads.
There are 2 fuzzy test, the first (faster) it map a smaller input space to Two main fuzz tests are provided:
- Faster: Maps a smaller input space to test the most likely inputs
- Slower: Has a bigger input space to explore "all" the edge case. It forces the buffer to be sent to different cores.
Both tests have been run for several hours without crashes.