Crate scratchpad [−] [src]
Stack-like dynamic memory pool with double-ended allocation support.
Table of Contents
- Overview
- Crate Features
- Examples
- Memory Management
- Data Alignment
- Memory Overhead
- Limitations
- Mutability Notes
Overview
Scratchpad
provides a method for quick and safe dynamic allocations of
arbitrary types without relying on the global heap (e.g. using Box
or
Vec
). Allocations are made from a fixed-size region of memory in a
stack-like fashion using two separate stacks (one for each end of the
allocation buffer) to allow different types of allocations with possibly
overlapping lifecycles to be made from each end.
General Usage
Groups of allocations are partitioned using Marker
objects. Markers
can be set at the front or back of a
scratchpad. Allocations can be made from either a front or back marker as
long as it is the most-recently created active marker of that type (e.g.
creating and allocating from a back marker still allows you to allocate
from the most recent front marker, but creating a new front marker blocks
allocations from previous front markers until the newer front marker is
dropped).
No memory is actually freed until the marker is dropped (although an
item's Drop
implementation is still called if necessary when the
allocation is dropped). Markers can be dropped in any order, but the
memory is not made available for reuse until any subsequently set markers
of the same type (front versus back) have also been dropped. This behavior
allows allocations and frees to be performed relatively quickly.
Additional Functionality
Aside from general memory management routines, the crate also provides a few additional features:
- Generalized slice support. Slices can be moved, cloned, or copied
into an allocation using
Marker::allocate_slice()
,Marker::allocate_slice_clone()
, orMarker::allocate_slice_copy()
, and existing allocations can be converted into slice allocations using theIntoSliceAllocation
trait. Allocations ofstr
slices are supported as well. - Extending and concatenating allocations. Under certain conditions,
allocations can be added to or combined:
- If two existing allocations from a given scratchpad are adjacent in
memory, they can be combined into a single slice allocation using
Allocation::concat()
. An unsafe option,Allocation::concat_unchecked()
, is also available that circumvents runtime validity checks when performance is a concern. MarkerFront::append()
andMarkerBack::prepend()
allow for extending allocations at the end of their respective stacks with new data. Variants of these methods also exist for cloning (append_clone()
,prepend_clone()
) and copying (append_copy()
,prepend_copy()
) the source values without moving them into an allocation.Marker::concat_slices_clone()
andMarker::concat_slices_copy()
generate a single allocation from multiple concatenated slices in a single operation.
- If two existing allocations from a given scratchpad are adjacent in
memory, they can be combined into a single slice allocation using
Use Cases
Some cases for which Scratchpad
can be useful include:
- Short-term dynamic allocations. A function may need to allocate a
variable amount of memory for a relatively short amount of time. Rust
currently does not provide anything analogous to the
alloca()
function in C (which also has its own pitfalls), and using the global heap can be slow and introduce fragmentation if other allocations are made before the short-term allocation is freed. Additionally, if each thread has its own scratchpad, overhead incurred from synchronization between threads can be eliminated. - Grouped allocation lifecycles. Some applications may incorporate some cycle in which they allocate an arbitrary amount of data that can be freed all at once after some period of time. One such case is with video games, where static data for a level may be allocated and persist only for the duration in which the level is active. Such allocations can be made by setting a marker for the level and dropping it when the level is unloaded. Other allocations with overlapping lifecycles, such as those persisting across levels, or temporary allocations needed while loading into one end of the scratchpad, can also be set at the other end of the same scratchpad at the same time.
Crate Features
The following optional features can be set when building this crate:
std
: AllowsBox
to be used as the storage type for allocations and marker tracking. Enabled by default; can be disabled to build the crate with#![no_std]
.unstable
: Enables unstable toolchain features (requires a nightly compiler). Disabled by default. Enabling this feature includes:ByteData
trait implementations foru128
/i128
.- Declaration of the function
Scratchpad::new()
asconst
. - Support for using
Box
as the storage type for allocations and marker tracking, regardless of whether thestd
feature is enabled (alloc
library is used ifstd
is disabled).
Examples
Applications can create per-thread scratchpads for temporary allocations using thread-local variables, reducing fragmentation of the global memory heap and improving performance. The following example shows how one might use temporary storage to interface with a third-party library within some abstraction layer:
#[macro_use] extern crate scratchpad; use scratchpad::{CacheAligned, Scratchpad}; use scratchpad::uninitialized_boxed_slice_for_bytes; use std::mem::{size_of, uninitialized}; use std::os::raw::c_void; /// Thread-local scratchpad size, in bytes (1 MB). const THREAD_LOCAL_SCRATCHPAD_SIZE: usize = 1024 * 1024; /// Maximum thread-local scratchpad allocation marker count. const THREAD_LOCAL_SCRATCHPAD_MAX_MARKERS: usize = 8; /// Buffer type for thread-local scratchpad allocations. By using /// `CacheAligned`, we avoid false sharing of cache lines between threads. type ThreadLocalScratchBuffer = Box<[CacheAligned]>; /// Buffer type for tracking of thread-local scratchpad markers (each /// marker requires a `usize` value). type ThreadLocalScratchTracking = array_type_for_markers!( CacheAligned, THREAD_LOCAL_SCRATCHPAD_MAX_MARKERS, ); thread_local! { /// Thread-local scratchpad. The initial contents of the allocation /// buffer and marker tracking buffer are ignored, so we can create /// them as uninitialized. pub static THREAD_LOCAL_SCRATCHPAD: Scratchpad< ThreadLocalScratchBuffer, ThreadLocalScratchTracking, > = unsafe { Scratchpad::new( uninitialized_boxed_slice_for_bytes(THREAD_LOCAL_SCRATCHPAD_SIZE), uninitialized(), ) }; } /// Rust bindings for part of some fictional third-party API written in /// C/C++. pub mod libfoo { use std::os::raw::c_void; #[repr(C)] pub enum SequenceType { Integer, Float, Double, } #[derive(Debug, PartialEq)] #[repr(C)] pub enum SequenceResult { Red, Green, Blue, } #[repr(C)] pub struct SequenceParameters { pub data_type: SequenceType, pub data_count: usize, pub data: *const c_void, } pub unsafe extern "C" fn process_sequences( sequence_count: usize, sequences: *const SequenceParameters, ) -> SequenceResult { // ... } } /// Our abstraction of `libfoo::process_sequences()`, in which we only /// ever work with `f32` data. pub fn process_float_sequences<I, E, S>( sequences: I, ) -> Result<libfoo::SequenceResult, scratchpad::Error<()>> where I: IntoIterator<Item = S, IntoIter = E>, E: ExactSizeIterator<Item = S>, S: AsRef<[f32]>, { THREAD_LOCAL_SCRATCHPAD.with(|scratchpad| { // We need an array of `libfoo::SequenceParameters` structs for // the call to `libfoo::process_sequences()`. let mut sequences = sequences.into_iter(); let sequences_len = sequences.len(); let marker = scratchpad.mark_front()?; let foo_sequences = marker.allocate_array_with( sequences_len, |index| { let data = sequences.next().unwrap(); let data_ref = data.as_ref(); libfoo::SequenceParameters { data_type: libfoo::SequenceType::Float, data_count: data_ref.len(), data: data_ref.as_ptr() as *const c_void, } } )?; Ok(unsafe { libfoo::process_sequences( sequences_len, foo_sequences.as_ptr(), ) }) // The marker is dropped as it goes out of scope, freeing the // allocated memory. }) } fn main() { let sequence_a = [2.22f32, 9.99f32, -1234.56f32]; let sequence_b = [-1.0f32, 8.8f32, 27.0f32, 0.03f32]; let sequence_c = [88.0f32]; let sequences = [&sequence_a[..], &sequence_b[..], &sequence_c[..]]; assert_eq!( process_float_sequences(&sequences).unwrap(), libfoo::SequenceResult::Red, ); }
Memory Management
Buffer Types
The backing data structures used for allocation storage and marker
tracking are declared in the generic parameters of the Scratchpad
type. This allows a fair degree of control over the memory usage of the
scratchpad itself.
The allocation storage type is provided by the BufferT
generic
parameter, which is constrained by the Buffer
trait. Allocation
storage must be a contiguous region of memory, such as a static array,
boxed slice, or mutable slice reference. Array element types are
constrained to types that implement the ByteData
trait; by default,
this only includes basic integer types and the CacheAligned
struct
provided by this crate.
The marker tracking buffer type is provided by the TrackingT
generic
parameter, which is constrained by the Tracking
trait. This type must
allow storage and retrieval of a fixed number of usize
values. Unlike
Buffer
, there is no restriction on how the values are stored so long
as they can be set and subsequently retrieved by index. Any type
implementing Buffer
also implements Tracking
by default.
Macros and Functions for Handling Buffers
Writing out the math needed to determine the number of elements needed for an array or boxed slice of a specific byte capacity or marker tracking capacity can be tedious and error-prone. This crate provides some macros and functions to help reduce the amount of work needed for declaring buffers based on their byte capacity or marker capacity:
array_type_for_bytes!()
andarray_type_for_markers!()
can be used to declare static array types based on their capacity.cache_aligned_zeroed_for_bytes!()
andcache_aligned_zeroed_for_markers!()
provide shorthand for creating static arrays ofCacheAligned
elements with their contents zeroed out. The expansion of this macro is a constant expression.uninitialized_boxed_slice_for_bytes()
anduninitialized_boxed_slice_for_markers()
can be used to allocate memory for a boxed slice of a given capacity without initializing its contents.
Some lower level macros and functions are also available if needed:
array_len_for_bytes!()
andarray_len_for_markers!()
return the number of elements needed for a static array with a given capacity. The results are constant expressions that can be evaluated at compile-time.cache_aligned_zeroed!()
provides shorthand for creating aCacheAligned
value with its contents zeroed out.uninitialized_boxed_slice()
allocates a boxed slice of a given number of elements without initializing its contents.
Data Alignment
Scratchpad
properly handles the alignment requirements of allocated
objects by padding the offset of the allocation within the allocation
buffer. This can result in some amount of wasted space if mixing
allocations of types that have different alignment requirements. This
waste can be minimized if necessary by grouping allocations based on their
alignment requirements or by using separate scratchpads for different
alignments.
Buffer
and Tracking
Alignments
The alignment of the allocation buffer and marker tracking themselves are
determined by the element type of the corresponding slice or array used,
as specified by the generic parameters of the Scratchpad
type and the
parameters of Scratchpad::new()
. If the alignment of the buffer itself
doesn't match the alignment needed for the first allocation made, the
allocation will be offset from the start of the buffer as needed,
resulting in wasted space.
To avoid this, use an element type for the buffer's slice or array that
provides at least the same alignment as that which will be needed for
allocations. For most types, a slice or array of u64
elements should
provide sufficient alignment to avoid any initial wasted space.
Allocations that require non-standard alignments may require defining a
custom ByteData
type with an appropriate #[repr(align)]
attribute to
avoid any wasted space at the start of the allocation buffer. For example,
a Scratchpad
guaranteeing a minimum initial alignment of 16 bytes for
SIMD allocations can be created as follows:
use std::mem::uninitialized; use scratchpad::{ByteData, Scratchpad}; #[repr(align(16))] struct Aligned16([u32; 4]); unsafe impl ByteData for Aligned16 {} let scratchpad = Scratchpad::<[Aligned16; 1024], [usize; 4]>::new( unsafe { uninitialized() }, unsafe { uninitialized() }, );
Alignment of Allocated Arrays
When allocating a dynamically sized array from a Marker
(i.e. using one of the allocate_array*()
methods), the array is only
guaranteed to be aligned based on the requirements of the element type.
This means that, for example, it is unsafe to use an array of u8
values
as a buffer in which f32
values will be written to or read from
directly. It is strongly recommended that you only use arrays allocated
from a marker as the element type specified, or that the array is
allocated using an element type whose alignment is at least as large as
the data it will contain.
Cache Alignment
Applications may prefer to keep data aligned to cache lines to avoid
performance issues (e.g. multiple cache line loads for data crossing cache
line boundaries, false sharing). The crate provides a simple data type,
CacheAligned
, that can be used as the backing type for both allocation
buffers and marker tracking. Simply providing an array or slice of
CacheAligned
objects instead of a built-in integer type will help
ensure cache alignment of the buffer. The size of a single
CacheAligned
element will always match its alignment.
This crate uses 64 bytes as the assumed cache line alignment, regardless
of the build target. While the actual cache line alignment can vary
between processors, 64 bytes is generally assumed to be a "safe" target.
This value is exported in the CACHE_ALIGNMENT
constant for
applications that wish to reference it.
Memory Overhead
Creating a marker requires a single usize
value (4 bytes on platforms
using 32-bit pointers, 8 bytes if using 64-bit pointers) within the
scratchpad's tracking buffer. When using a slice or array for marker
tracking, this memory is allocated up-front, so the footprint of the
Scratchpad
does not change after the scratchpad is created.
Each Marker
instance itself contains a reference back to its
Scratchpad
and its index within the scratchpad. This comes out to 8
bytes on platforms with 32-bit pointers and 16 bytes on platforms with
64-bit pointers.
Individual allocations have effectively no overhead. Each Allocation
instance itself only contains a pointer to its data, whose size can vary
depending on whether a fat pointer is necessary (such as for dynamically
sized arrays allocated using one of the allocate_array*()
marker
methods).
Limitations
- Due to a lack of support in Rust for generically implementing traits for
any size of a static array of a given type, the
Buffer
trait (and, by association,Tracking
trait) is only implemented for a limited number of array sizes. Mutable slice references and boxed slices do not have this restriction, so they can be used for unsupported array sizes. - Using large static arrays as buffers can cause the program stack to
overflow while creating a scratchpad, particularly in debug builds.
Using
Scratchpad::static_new()
instead ofScratchpad::new()
can help avoid such issues, and using either boxed slices or slice references of externally owned arrays for storage can help avoid such issues entirely.
Mutability Notes
Scratchpad
uses internal mutability when allocating and freeing
memory, with allocation methods operating on immutable Scratchpad
and
Marker
references. This is necessary to cleanly allow for multiple
concurrent allocations, as Rust's mutable borrowing restrictions would
otherwise prevent such behavior. Allocation and free operations do not
have any side effect on other existing allocations, so there are no
special considerations necessary by the user.
Macros
array_len_for_bytes |
Returns the minimum number of elements of a given type necessary for storage of a given byte count. The actual supported byte count may be larger due to padding. |
array_len_for_markers |
Returns the minimum number of elements of a given type necessary for tracking of at least the specified number of allocation markers. The actual supported marker count may be larger due to padding. |
array_type_for_bytes |
Declares a static array of the specified element type that is large enough for storage of a given byte count. The actual supported byte count may be larger due to padding. |
array_type_for_markers |
Declares a static array of the specified element type that is large enough for storage of at least the specified number of allocation markers. The actual supported marker count may be larger due to padding. |
cache_aligned_zeroed |
Creates a |
cache_aligned_zeroed_for_bytes |
Creates an array of zeroed-out |
cache_aligned_zeroed_for_markers |
Creates an array of zeroed-out |
Structs
Allocation |
Scratchpad |
CacheAligned | |
Error |
The error type for scratchpad operations. |
MarkerBack |
|
MarkerFront |
|
Scratchpad |
Stack-like dynamic memory pool with double-ended allocation support. |
Enums
ErrorKind |
Categories of errors during scratchpad operations. |
Constants
CACHE_ALIGNMENT |
Assumed cache line byte alignment. |
Traits
Buffer |
Trait for |
ByteData |
Trait for types that can be safely used as the backing data type for storage of arbitrary data. |
ConcatenateSlice |
Extension of the [ |
IntoSliceAllocation |
Trait for safely converting an |
Marker |
|
OwnedSlice |
Trait for types containing ranges of values that can be safely moved out while consuming the original container type. |
SliceLike |
Trait for dynamically sized types that wrap some slice type. |
StaticBuffer |
|
Tracking |
Trait for |
Functions
size_of |
Returns the size of a type in bytes. |
uninitialized_boxed_slice⚠ |
Returns a boxed slice of a given length whose data is uninitialized. |
uninitialized_boxed_slice_for_bytes⚠ |
Returns a boxed slice of uninitialized data large enough to hold at least the specified number of bytes. |
uninitialized_boxed_slice_for_markers⚠ |
Returns a boxed slice of uninitialized data large enough for tracking of at least the specified number of allocation markers. |