emballoc 0.3.0

Simple but reliable memory allocator for embedded Rust and #![no_std]
Documentation
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
//! Simple allocator for embedded systems
//!
//! This crate provides a single type called [`Allocator`]. This type implements
//! the [`core::alloc::GlobalAlloc`]-trait, which is required to use the
//! [`alloc`-crate][alloc] on `#![no_std]`-targets. The allocator provided in
//! this crate is relatively simple, but reliable: its design is simple, so that
//! errors in the implementation are unlikely. Furthermore the crate is tested
//! rigorously (see below).
//!
//! # Usage
//! The usage is simple: just copy and paste the following code snipped into
//! your binary crate and potentially adjust the number of bytes of the heap
//! (here 4K):
//! ```
//! #[global_allocator]
//! static ALLOCATOR: emballoc::Allocator<4096> = emballoc::Allocator::new();
//!
//! extern crate alloc;
//! ```
//! Afterwards you don't need to interact with the crate or the variable
//! `ALLOCATOR` anymore. Now you can just `use alloc::vec::Vec` or even
//! `use alloc::collections::BTreeMap`, i.e. every fancy collection which is
//! normally provided by the `std`.
//!
//! Note, that the usable dynamic memory is less than the total heap size `N`
//! due to the management of the individual allocations. Each allocation will
//! require a constant memory overhead of _4 bytes_[^note-header-size]. This
//! implies, that more allocations will result in less usable space in the heap.
//! The minimal buffer size is `8`, which would allow exactly one allocation of
//! size up to 4 at a time. Adjust the size as necessary, e.g. by doing a worst
//! case calculation and potentially adding some backup space of 10% (for
//! example).
//!
//! [^note-header-size]: this value is critical for worst-case calculations and
//! therefore part of the stability guarantees of this crate. Changing it will
//! be a breaking change and thus requires a major version bump.
//!
//! The allocator itself is thread-safe, as there is no potentially unsafe
//! [`Cell<T>`]-action done in this crate. Instead it uses the popular [`spin`]
//! crate to use a simple lock on the internal data structures. While this is
//! simple for the implementation, this is not good for performance on
//! multi-threaded systems: you have a global lock for each memory allocation,
//! so only on thread can do a memory allocation at a time. However, this isn't
//! really a problem in embedded systems anyway, as there is only on thread
//! possible without Real Time Operating Systems (RTOS) (which in turn often
//! provide a memory allocator as well, so that one can be used). Therefore this
//! crate is __sound__ and safe to use in multi-threaded contexts, but the
//! performance is not as good as it might be on a lock-free implementation
//! (converting this crate to such a lock free version would be possible, but
//! likely complicate the algorithm and thus is incompatible with the goal to be
//! "simple").
//!
//! A general problem with non-lock-free allocators is the following: it can
//! cause deadlocks even in single-threaded environments if there are interrupts
//! that will _allocate memory_. The interrupt is kind of a second thread, that
//! preempts the first one at any time, even if an allocation is currently in
//! progress. In such a case the interrupt would wait on the lock and never
//! obtain it, since the main program is interrupted and thus cannot release the
//! lock. Therefore it is advised to never use any allocations (or deallocations
//! to the same extend) in an interrupt handler. Performance-wise this shouldn't
//! be done anyway.
//!
//! # Advanced embedded features
//! Note to users with things like `MPU`s, `MMU`s, etc.: your device might
//! support things like memory remapping or memory protection with setting
//! read/write/execution rights. This crate _doesn't use_ those features at all!
//! If that is desired, you should take the address of the buffer and use that
//! along with the known size `N` to protect the heap memory. To users with a
//! fully-working MMU: it is recommended, that you use an allocator, that
//! actually supports paging, etc. This crate might still be helpful, e.g.
//! before setting up the MMU.
//!
//! # Testing
//! As mentioned before: an allocator is a very critical part in the overall
//! system; a misbehaving allocator can break the whole program. Therefore this
//! create is tested extensively[^note-testing]:
//! - there are unit tests and integration tests
//! - the unit tests are run under `miri` to detect undefined behavior, both on
//!   a little-endian and big-endian system
//! - a real-world test using the allocator in the popular `ripgrep` program was
//!   done (see [here][gist_hosted-test])
//!
//! [^note-testing]: The test coverage of different metrics and tools is over
//! 96% (self-set lower limit for this crate). Typically, code coverage tends to
//! vary between coverage measurement kinds (e.g. line/region coverage). One can
//! refer to the latest [codecov.io][codecov] measurement or refer to the more
//! precise measurements obtained by the Rust _instrumented coverage_ feature.
//! Those can be found in the [CI-logs] (see the `coverage` job and there look
//! at "Record testing coverage"). There is a table showing the percentages of
//! different metrics per file.
//!
//! # Implementation
//! This algorithm does a linear scan for free blocks. The basic algorithm is as
//! follows:
//! 1.  We start with an empty buffer.
//!     ```text
//!     xxxx 0000 0000 0000 0000 0000 0000 0000
//!     ^--- ^---------------------------------
//!     FREE size = 28
//!     ```
//!     There is a single entry, which spans all the remaining buffer bytes
//!     (after the entry itself, which is always 4 bytes).
//! 2.  A block of 8 is allocated.
//!     ```text
//!     xxxx 0000 0000 yyyy 0000 0000 0000 0000
//!     ^--- ^-------- ^--- ^------------------
//!     USED size = 8  FREE size = 16
//!     ```
//!     Now the only free block (the FREE block of step 1) is split into two.
//!     There is now a used block with a total size of 12 bytes, 4 bytes for the
//!     header and 8 bytes for the content. The remaining buffer space is
//!     occupied by the FREE-element. Note, that the total number of "usable"
//!     space (the memory without the headers) shrunk from 28 to 24 (16 + 8)
//!     bytes, since there is now an additional header.
//! 3.  Another block of 4 is allocated.
//!     ```text
//!     xxxx 0000 0000 yyyy 0000 zzzz 0000 0000
//!     ^--- ^-------- ^--- ^--- ^--- ^--------
//!     USED size = 8  USED size FREE size = 8
//!     ```
//!     The same thing as in step 2 happens. Now there are two used blocks and
//!     a single free block with a size of 8.
//! 4.  A request for a block of 16 comes in. There is not enough free memory
//!     for that request. Therefore the allocation fails.
//! 5.  A block of 5 is allocated.
//!     ```text
//!     xxxx 0000 0000 yyyy 0000 zzzz 0000 0000
//!     ^--- ^-------- ^--- ^--- ^--- ^-----!!!
//!     USED size = 8  USED size USED size = 8
//!     ```
//!     There is not enough space at the end of the memory buffer, therefore the
//!     current entry is enlarged to fill the remaining space. This "wastes" 3
//!     bytes, but those would not be usable anyway.
//!
//!     To prevent alignment issues, the blocks are always rounded up to a
//!     multiple of 4 as well, which has the same result (this implies, that the
//!     aforementioned special handling of the remaining bytes is not necessary,
//!     care has to be taken to handle 0-sized "free" blocks correctly).
//! 6.  A request for a block of 1 comes in. There is no free memory at all and
//!     hence not enough free memory for that request. Therefore the allocation
//!     fails.
//! 7.  The third allocation (block size 5) is freed.
//!     ```text
//!     xxxx 0000 0000 yyyy 0000 zzzz 0000 0000
//!     ^--- ^-------- ^--- ^--- ^--- ^--------
//!     USED size = 8  USED size FREE size = 8
//!     ```
//!     The picture of step 3 is restored.
//! 8.  The first allocation (block size 8) is freed.
//!     ```text
//!     xxxx 0000 0000 yyyy 0000 zzzz 0000 0000
//!     ^--- ^-------- ^--- ^--- ^--- ^--------
//!     FREE size = 8  USED size FREE size = 8
//!     ```
//!     Now there are two free blocks and a usable block. Note, that there is
//!     fragmentation, so a request for 12 bytes could not be fulfilled, since
//!     there is no contiguous memory of that size.
//! 9.  Another block of 8 is allocated.
//!     ```text
//!     xxxx 0000 0000 yyyy 0000 zzzz 0000 0000
//!     ^--- ^-------- ^--- ^--- ^--- ^--------
//!     USED size = 8  USED size FREE size = 8
//!     ```
//!     Nothing special here, except that the allocator could choose between the
//!     two blocks of 8. Here the first one was chosen (arbitrarily).
//! 10. The second allocation (block size 4) is freed.
//!     ```text
//!     xxxx 0000 0000 yyyy 0000 0000 0000 0000
//!     ^--- ^-------- ^--- ^------------------
//!     USED size = 8  FREE size = 16
//!     ```
//!     The block is simply replaced by a FREE block, but there is a caveat: the
//!     two adjacent blocks have to be connected to a single big FREE-block in
//!     order to prevent more fragmentation. They are one continuous block with
//!     a single header.
//!
//!     This connection is easy, since the middle block of step 9 just has to
//!     look for the next header (the position of that block is known by its
//!     size) and check, whether it is free. If so, the new block gets adjusted
//!     to have a size of `self.size + 4 + other.size`. This effectively erases
//!     the right free block.
//! 11. A new block of 8 is allocated. Afterwards the first block is freed.
//!     ```text
//!     xxxx 0000 0000 yyyy 0000 0000 0000 0000
//!     ^--- ^-------- ^--- ^-------- ^--- ^---
//!     FREE size = 8  USED size = 8  FREE size
//!     ```
//!     This is just an intermediate step without any issues.
//! 12. The remaining used block is freed.
//!     ```text
//!     xxxx 0000 0000 yyyy 0000 0000 0000 0000
//!     ^--- ^-------- ^--- ^------------------
//!     FREE size = 8  FREE size = 16
//!     ```
//!     Now there are two(!) free blocks, since the concatenation described in
//!     step 10 does only happen to the right side of the freed block. Since the
//!     left block has an unknown size, it is not possible to find the header
//!     (except for linearly scanning the memory from the beginning). Therefore
//!     it is easier to just live with that fragmentation.
//!
//!     Something interesting here is, that one could check for such conditions
//!     from time to time and fix them during that scan. Doing it this way does
//!     not come with a constant time penalty when deallocating. Furthermore it
//!     lets the user decide, whether that feature is necessary or not.
//!
//! [alloc]: https://doc.rust-lang.org/alloc/index.html
//! [gist_hosted-test]: https://gist.github.com/jfrimmel/61943f9879adfbe760a78efa17a0ecaa
//! [`Cell<T>`]: core::cell::Cell
//! [codecov]: https://codecov.io/gh/jfrimmel/emballoc
//! [ci-logs]: https://app.circleci.com/pipelines/github/jfrimmel/emballoc
#![cfg_attr(not(test), no_std)]
#![warn(unsafe_op_in_unsafe_fn)]
#![warn(clippy::undocumented_unsafe_blocks)]

mod raw_allocator;
use raw_allocator::RawAllocator;

use core::alloc::{GlobalAlloc, Layout};
use core::ptr;

/// The memory allocator for embedded systems.
///
/// This is the core type of this crate: it is an allocator with a predefined
/// heap size. Therefore the heap memory usage is statically limited to an upper
/// value, which also helps to prevent issues with heap/stack-smashes, as the
/// heap is counted to the static memory (e.g. `.data`/`.bss`-sections). Such a
/// smash might still happen though, if the stack pointer grows into the heap,
/// but the heap cannot grow into the stack pointer.
///
/// Its usage is simple: just copy and paste the following in the binary crate
/// you're developing. The memory size of the heap is 4096 bytes (4K) in this
/// example. Adjust that value to your needs.
/// ```no_run
/// #[global_allocator]
/// static ALLOCATOR: emballoc::Allocator<4096> = emballoc::Allocator::new();
/// ```
/// Also please refer to the [crate-level](crate)-documentation for
/// recommendations on the buffer size and general usage.
pub struct Allocator<const N: usize> {
    /// The internal raw allocator.
    ///
    /// The raw allocator handles allocations of contiguous byte slices without
    /// needing to worry about alignment. The raw allocator is protected by a
    /// `spin::Mutex` to make it usable with shared references (requirement of
    /// [`GlobalAlloc`]).
    raw: spin::Mutex<RawAllocator<N>>,
}
impl<const N: usize> Allocator<N> {
    /// Create a new [`Allocator`] with exactly `N` bytes heap space.
    ///
    /// Note, that the usable size is less than the heap size, since there is
    /// some bookkeeping-overhead stored in that area as well. Please see the
    /// [crate-level](crate)-documentation for recommendations on the buffer
    /// size and general usage.
    ///
    /// This function is a `const fn`, therefore you can call it directly when
    /// creating the allocator.
    ///
    /// # Example
    /// Normally one would use a static variable to assign the result of this
    /// function to. When using this, the size is specified in the type and
    /// does not need to be repeated in the call to this function.
    /// ```
    /// static ALLOCATOR: emballoc::Allocator<4096> = emballoc::Allocator::new();
    /// ```
    /// Note: don't use a `const`, as this would freshly instantiate the
    /// allocator every time used and dropping the allocation state.
    ///
    /// If the allocator is used locally, then the size has to be specified like
    /// this:
    /// ```
    /// let allocator = emballoc::Allocator::<64>::new(); // note: smaller size!
    /// ```
    /// Be careful to use a smaller size, since this allocator will be created
    /// on the stack! Therefore it is possible to easily blow up the stack, so
    /// this usage is discouraged and only should be done in special cases.
    ///
    /// # Panics
    /// This function will panic, if the supplied buffer size, i.e. `N`, is less
    /// than `8` or not divisible by `4`.
    /// ```should_panic
    /// emballoc::Allocator::<63>::new(); // not divisible by 4
    /// ```
    /// ```should_panic
    /// emballoc::Allocator::<4>::new(); // less than 8
    /// ```
    #[must_use = "assign the allocator to a static variable and apply the `#[global_allocator]`-attribute to make it the global allocator"]
    #[allow(clippy::new_without_default)] // this could be added, but not now
    pub const fn new() -> Self {
        let raw = spin::Mutex::new(RawAllocator::new());
        Self { raw }
    }

    /// Align a given pointer to the specified alignment.
    ///
    /// # Safety
    /// This function requires `align` to be a power of two and requires the
    /// `ptr` to point to a memory region, that is large enough, so that the
    /// aligned pointer is still in that memory region.
    unsafe fn align_to(ptr: *mut u8, align: usize) -> *mut u8 {
        let addr = ptr as usize;
        let mismatch = addr & (align - 1);
        let offset = if mismatch == 0 { 0 } else { align - mismatch };
        // SAFETY: "in-bound"-requirement is part of the safety-contract of this
        // function, therefore the caller is responsible for it
        unsafe { ptr.add(offset) }
    }
}
// SAFETY: the safety contracts of global allocator is a bit lengthy, but in
// short: the implementation does not panic (at least on purpose, if it would,
// there is a bug) and it actually adheres to the layout requirements (ensured
// by tests).
unsafe impl<const N: usize> GlobalAlloc for Allocator<N> {
    unsafe fn alloc(&self, layout: Layout) -> *mut u8 {
        let align = layout.align();
        // the raw allocator always returns 4-byte-aligned slices, therefore
        // smaller alignments are always fulfilled. Larger alignments are a bit
        // more tricky, since this requires over-allocation and adjusting the
        // pointer accordingly. The over-allocation is rather conservative and
        // uses a worst case estimation, therefore it allocates `align` bytes
        // more, ensuring there is enough memory.
        let size = if align > 4 {
            layout.size() + align
        } else {
            layout.size()
        };

        // allocate a memory block and return the sufficiently aligned pointer
        // into that memory block.
        self.raw
            .lock()
            .alloc(size)
            .map_or(ptr::null_mut(), |memory| {
                // SAFETY: `align` is a power of two as by the contract of
                // `Layout`. Furthermore the memory slice is enlarged (see
                // above), so that the aligned pointer will still be in the same
                // allocation.
                unsafe { Self::align_to(ptr::addr_of_mut!(*memory).cast(), align) }
            })
    }

    unsafe fn dealloc(&self, ptr: *mut u8, _layout: Layout) {
        // alignment is irrelevant here, as `RawAllocator::free` can handle any
        // pointer in an entry's memory, so simply forward the pointer. The
        // `free()`-method might detect errors, but those cannot lead to panics
        // (by contract of `GlobalAlloc`). Therefore there are two choices:
        // 1. abort the process
        // 2. ignore the error
        // Since there is no process and there is no stable way to abort the
        // program on `core` the only viable option is option #1: do nothing.
        let _maybe_error = self.raw.lock().free(ptr.cast()).ok();
        // errors are ignored
    }
}

// include the readme in doc-tests. Credits to https://blog.guillaume-gomez.fr/articles/2020-03-07+cfg%28doctest%29+is+stable+and+you+should+use+it
#[cfg(doctest)]
mod extra_doctests {
    /// Helper macro to pass a "dynamic"/included string to the `extern`-block
    macro_rules! doc_check {
        ($x:expr) => {
            #[doc = $x]
            extern "C" {}
        };
    }
    // Check the code snippets in the Readme.
    doc_check!(include_str!("../README.md"));
}

#[cfg(test)]
mod tests {
    use crate::Allocator;
    use core::alloc::{GlobalAlloc, Layout};
    use core::ptr;

    #[test]
    fn alignment_of_align_to() {
        // create buffer memory for proper indexing. One could use random
        // integers and cast them to pointers, but this would violate the strict
        // provenance rules and `miri` would detect that. Therefore this uses a
        // valid and suitable aligned buffer and uses pointers into that buffer.
        #[repr(align(16))]
        struct Align([u8; 16]);
        let mut just_a_buffer_to_get_a_valid_address = Align([0_u8; 16]);
        let base: *mut u8 = ptr::addr_of_mut!(just_a_buffer_to_get_a_valid_address.0).cast();

        // create some pointers to the buffer with some offsets
        let ptr_0x10 = base;
        let ptr_0x11 = base.wrapping_add(1);
        let ptr_0x14 = base.wrapping_add(4);
        let ptr_0x1c = base.wrapping_add(0xc);
        let ptr_0x20 = base.wrapping_add(0x10);

        // the actual test for the alignment of `align_to()`
        assert_eq!(unsafe { Allocator::<8>::align_to(ptr_0x11, 4) }, ptr_0x14);
        assert_eq!(unsafe { Allocator::<8>::align_to(ptr_0x10, 4) }, ptr_0x10);

        assert_eq!(unsafe { Allocator::<8>::align_to(ptr_0x11, 1) }, ptr_0x11);

        assert_eq!(unsafe { Allocator::<8>::align_to(ptr_0x1c, 16) }, ptr_0x20);
    }

    // the following tests ensure, that a pointer with the requested alignment
    // is returned

    /// Assert the given alignment of pointers.
    macro_rules! assert_alignment {
        ($ptr:expr, $align:expr) => {{
            assert_eq!(($ptr as usize) % $align, 0, "Alignment not fulfilled");
        }};
    }

    #[test]
    fn small_alignments() {
        let allocator = Allocator::<128>::new();

        let ptr = unsafe { allocator.alloc(Layout::from_size_align(8, 2).unwrap()) };
        assert_alignment!(ptr, 1);

        let ptr = unsafe { allocator.alloc(Layout::from_size_align(4, 4).unwrap()) };
        assert_alignment!(ptr, 4);
    }

    #[test]
    fn medium_alignments() {
        let allocator = Allocator::<128>::new();

        let ptr = unsafe { allocator.alloc(Layout::from_size_align(4, 8).unwrap()) };
        assert_alignment!(ptr, 8);

        let ptr = unsafe { allocator.alloc(Layout::from_size_align(4, 32).unwrap()) };
        assert_alignment!(ptr, 32);
    }

    #[cfg(not(miri))] // too slow
    #[test]
    fn huge_alignment() {
        // in static memory to prevent stack overflow
        const FOUR_MEG: usize = 4 * 1024 * 1024;

        static ALLOCATOR: Allocator<{ 10 * 1024 * 1024 }> = Allocator::new();
        let ptr = unsafe { ALLOCATOR.alloc(Layout::from_size_align(4, FOUR_MEG).unwrap()) };

        assert_alignment!(ptr, FOUR_MEG);
    }

    #[test]
    fn allocation_failure() {
        let allocator = Allocator::<128>::new();

        // try an allocation, that exceeds the total memory size
        let ptr = unsafe { allocator.alloc(Layout::from_size_align(129, 1).unwrap()) };
        assert_eq!(ptr, ptr::null_mut());
    }

    #[test]
    fn allocation_failure_due_to_alignment() {
        let allocator = Allocator::<128>::new();

        // try an allocation, that exceeds the total memory size
        let ptr = unsafe { allocator.alloc(Layout::from_size_align(8, 128).unwrap()) };
        assert_eq!(ptr, ptr::null_mut());
    }

    #[test]
    fn example_usage() {
        // do some example allocations. There is an intermediate deallocation,
        // different allocation/deallocation-orders, different alignments and
        // different sizes.
        static ALLOCATOR: Allocator<4096> = Allocator::new();

        unsafe {
            let layout1 = Layout::new::<u32>();
            let ptr1 = ALLOCATOR.alloc(layout1);
            assert_ne!(ptr1, ptr::null_mut());

            let layout2 = Layout::new::<f64>();
            let ptr2 = ALLOCATOR.alloc(layout2);
            assert_ne!(ptr2, ptr::null_mut());

            let layout3 = Layout::new::<[u16; 12]>();
            let ptr3 = ALLOCATOR.alloc(layout3);
            assert_ne!(ptr3, ptr::null_mut());

            ALLOCATOR.dealloc(ptr2, layout2);

            let layout4 = Layout::new::<[u128; 3]>();
            let ptr4 = ALLOCATOR.alloc(layout4);
            assert_ne!(ptr4, ptr::null_mut());

            let layout5 = Layout::new::<f32>();
            let ptr5 = ALLOCATOR.alloc(layout5);
            assert_ne!(ptr5, ptr::null_mut());

            ALLOCATOR.dealloc(ptr3, layout3);
            ALLOCATOR.dealloc(ptr4, layout4);
            ALLOCATOR.dealloc(ptr5, layout5);
            ALLOCATOR.dealloc(ptr1, layout1);
        }
    }
}