Expand description
Simple allocator for embedded systems
This crate provides a single type called Allocator
. This type implements
the core::alloc::GlobalAlloc
-trait, which is required to use the
alloc
-crate on #![no_std]
-targets. The allocator provided in
this crate is relatively simple, but reliable: its design is simple, so that
errors in the implementation are unlikely. Furthermore the crate is tested
rigorously (see below).
§Usage
The usage is simple: just copy and paste the following code snipped into your binary crate and potentially adjust the number of bytes of the heap (here 4K):
#[global_allocator]
static ALLOCATOR: emballoc::Allocator<4096> = emballoc::Allocator::new();
extern crate alloc;
Afterwards you don’t need to interact with the crate or the variable
ALLOCATOR
anymore. Now you can just use alloc::vec::Vec
or even
use alloc::collections::BTreeMap
, i.e. every fancy collection which is
normally provided by the std
.
Note, that the usable dynamic memory is less than the total heap size N
due to the management of the individual allocations. Each allocation will
require a constant memory overhead of 4 bytes1. This
implies, that more allocations will result in less usable space in the heap.
The minimal buffer size is 8
, which would allow exactly one allocation of
size up to 4 at a time. Adjust the size as necessary, e.g. by doing a worst
case calculation and potentially adding some backup space of 10% (for
example).
The allocator itself is thread-safe, as there is no potentially unsafe
Cell<T>
-action done in this crate. Instead it uses the popular spin
crate to use a simple lock on the internal data structures. While this is
simple for the implementation, this is not good for performance on
multi-threaded systems: you have a global lock for each memory allocation,
so only on thread can do a memory allocation at a time. However, this isn’t
really a problem in embedded systems anyway, as there is only on thread
possible without Real Time Operating Systems (RTOS) (which in turn often
provide a memory allocator as well, so that one can be used). Therefore this
crate is sound and safe to use in multi-threaded contexts, but the
performance is not as good as it might be on a lock-free implementation
(converting this crate to such a lock free version would be possible, but
likely complicate the algorithm and thus is incompatible with the goal to be
“simple”).
A general problem with non-lock-free allocators is the following: it can cause deadlocks even in single-threaded environments if there are interrupts that will allocate memory. The interrupt is kind of a second thread, that preempts the first one at any time, even if an allocation is currently in progress. In such a case the interrupt would wait on the lock and never obtain it, since the main program is interrupted and thus cannot release the lock. Therefore it is advised to never use any allocations (or deallocations to the same extend) in an interrupt handler. Performance-wise this shouldn’t be done anyway.
§Advanced embedded features
Note to users with things like MPU
s, MMU
s, etc.: your device might
support things like memory remapping or memory protection with setting
read/write/execution rights. This crate doesn’t use those features at all!
If that is desired, you should take the address of the buffer and use that
along with the known size N
to protect the heap memory. To users with a
fully-working MMU: it is recommended, that you use an allocator, that
actually supports paging, etc. This crate might still be helpful, e.g.
before setting up the MMU.
§Testing
As mentioned before: an allocator is a very critical part in the overall system; a misbehaving allocator can break the whole program. Therefore this create is tested extensively2:
- there are unit tests and integration tests
- the unit tests are run under
miri
to detect undefined behavior, both on a little-endian and big-endian system - a real-world test using the allocator in the popular
ripgrep
program was done (see here)
§Implementation
This algorithm does a linear scan for free blocks. The basic algorithm is as follows:
-
We start with an empty buffer.
xxxx 0000 0000 0000 0000 0000 0000 0000 ^--- ^--------------------------------- FREE size = 28
There is a single entry, which spans all the remaining buffer bytes (after the entry itself, which is always 4 bytes).
-
A block of 8 is allocated.
xxxx 0000 0000 yyyy 0000 0000 0000 0000 ^--- ^-------- ^--- ^------------------ USED size = 8 FREE size = 16
Now the only free block (the FREE block of step 1) is split into two. There is now a used block with a total size of 12 bytes, 4 bytes for the header and 8 bytes for the content. The remaining buffer space is occupied by the FREE-element. Note, that the total number of “usable” space (the memory without the headers) shrunk from 28 to 24 (16 + 8) bytes, since there is now an additional header.
-
Another block of 4 is allocated.
xxxx 0000 0000 yyyy 0000 zzzz 0000 0000 ^--- ^-------- ^--- ^--- ^--- ^-------- USED size = 8 USED size FREE size = 8
The same thing as in step 2 happens. Now there are two used blocks and a single free block with a size of 8.
-
A request for a block of 16 comes in. There is not enough free memory for that request. Therefore the allocation fails.
-
A block of 5 is allocated.
xxxx 0000 0000 yyyy 0000 zzzz 0000 0000 ^--- ^-------- ^--- ^--- ^--- ^-----!!! USED size = 8 USED size USED size = 8
There is not enough space at the end of the memory buffer, therefore the current entry is enlarged to fill the remaining space. This “wastes” 3 bytes, but those would not be usable anyway.
To prevent alignment issues, the blocks are always rounded up to a multiple of 4 as well, which has the same result (this implies, that the aforementioned special handling of the remaining bytes is not necessary, care has to be taken to handle 0-sized “free” blocks correctly).
-
A request for a block of 1 comes in. There is no free memory at all and hence not enough free memory for that request. Therefore the allocation fails.
-
The third allocation (block size 5) is freed.
xxxx 0000 0000 yyyy 0000 zzzz 0000 0000 ^--- ^-------- ^--- ^--- ^--- ^-------- USED size = 8 USED size FREE size = 8
The picture of step 3 is restored.
-
The first allocation (block size 8) is freed.
xxxx 0000 0000 yyyy 0000 zzzz 0000 0000 ^--- ^-------- ^--- ^--- ^--- ^-------- FREE size = 8 USED size FREE size = 8
Now there are two free blocks and a usable block. Note, that there is fragmentation, so a request for 12 bytes could not be fulfilled, since there is no contiguous memory of that size.
-
Another block of 8 is allocated.
xxxx 0000 0000 yyyy 0000 zzzz 0000 0000 ^--- ^-------- ^--- ^--- ^--- ^-------- USED size = 8 USED size FREE size = 8
Nothing special here, except that the allocator could choose between the two blocks of 8. Here the first one was chosen (arbitrarily).
-
The second allocation (block size 4) is freed.
xxxx 0000 0000 yyyy 0000 0000 0000 0000 ^--- ^-------- ^--- ^------------------ USED size = 8 FREE size = 16
The block is simply replaced by a FREE block, but there is a caveat: the two adjacent blocks have to be connected to a single big FREE-block in order to prevent more fragmentation. They are one continuous block with a single header.
This connection is easy, since the middle block of step 9 just has to look for the next header (the position of that block is known by its size) and check, whether it is free. If so, the new block gets adjusted to have a size of
self.size + 4 + other.size
. This effectively erases the right free block. -
A new block of 8 is allocated. Afterwards the first block is freed.
xxxx 0000 0000 yyyy 0000 0000 0000 0000 ^--- ^-------- ^--- ^-------- ^--- ^--- FREE size = 8 USED size = 8 FREE size
This is just an intermediate step without any issues.
-
The remaining used block is freed.
xxxx 0000 0000 yyyy 0000 0000 0000 0000 ^--- ^-------- ^--- ^------------------ FREE size = 8 FREE size = 16
Now there are two(!) free blocks, since the concatenation described in step 10 does only happen to the right side of the freed block. Since the left block has an unknown size, it is not possible to find the header (except for linearly scanning the memory from the beginning). Therefore it is easier to just live with that fragmentation.
Something interesting here is, that one could check for such conditions from time to time and fix them during that scan. Doing it this way does not come with a constant time penalty when deallocating. Furthermore it lets the user decide, whether that feature is necessary or not.
this value is critical for worst-case calculations and therefore part of the stability guarantees of this crate. Changing it will be a breaking change and thus requires a major version bump. ↩
The test coverage of different metrics and tools is over 96% (self-set lower limit for this crate). Typically, code coverage tends to vary between coverage measurement kinds (e.g. line/region coverage). One can refer to the latest codecov.io measurement or refer to the more precise measurements obtained by the Rust instrumented coverage feature. Those can be found in the CI-logs (see the
coverage
job and there look at “Record testing coverage”). There is a table showing the percentages of different metrics per file. ↩
Structs§
- Allocator
- The memory allocator for embedded systems.