Space-Efficient Extensible Arrays
Overview
This Rust crate implements a growable array as described in section 3 of the paper Immediate-Access Indexing Using Space-Efficient Extensible Arrays by Alistair Moffat and Joel Mackenzie, published in 2022.
- ACM ISBN 979-8-4007-0021-7/22/12
- https://doi.org/10.1145/3572960.3572984
This data structure supports push and pop operations and does not support inserts or removes at other locations within the array. One exception is the swap/remove operation which will retrieve a value from a specified index, overwrite that slot with the value at the end of the array, decrement the count, and return the retrieved value.
This is part of a collection of similar data structures:
- Optimal Arrays (Brodnik et. al.)
- Similar O(√N) space overhead and similar O(1) running time for most operations
- Faster than Extensible Arrays for repeated pushes but slower for ordered access
- Optimal Arrays (Tarjan and Zwick)
- O(rN^1/r) space overhead and similar O(1) running time for most operations
- Copying during grow/shrink adds significant time to overall performance
- Segment Array
- Grows geometrically like
Vec, hence O(N) space overhead - Faster than Extensible Arrays for repeated pushes and ordered access
- Grows geometrically like
Background
The intent of this data structure is to minimize the memory overhead and thus efficiently utilize a significant portion of the available RAM of a system. From section 2 of the paper:
More generally, the old and new segments must be assumed to be disjoint, and at the critical transition moment both must be assumed to be present simultaneously. That is, the momentary overhead space cost is 𝑘𝑛 elements, which is a substantial imposition, and might mean that memory utilization is restricted to less than 50% of available capacity. Note that this accounting in regard to the Geometric approach also relies on memory segments being reusable after they have been released back into the memory pool. That is not in any way guaranteed, and in the absence of a defragmentation operation it might be that a Geometric extensible array leaves behind a trail of empty-but-unusable space, further worsening the effective space overhead ratio.
The specification for the data structure is contained in section 3 of the paper.
Memory Usage
Compared to the Vec type in the Rust standard library, this data structure will have substantially less unused space, on the order of O(√N). The dope vector contributes to the overhead of this data structure, and that is on the order of O(√N). Based on the current implementation of Vec, as much as 50% of the space may be unused since it has a growth factor of 2. The Segment Array has the same growth factor as Vec and potentially the same proportion of unused space (its dope vector is a fixed size).
Examples
A simple example copied from the unit tests.
let inputs = ;
let mut arr: = new;
for item in inputs
for in arr.iter.enumerate
Supported Rust Versions
The Rust edition is set to 2024 and hence version 1.85.0 is the minimum supported version.
Troubleshooting
Memory Leaks
Finding memory leaks with Address Sanitizer is fairly easy and seems to work best on Linux. The shell script below gives a quick demonstration of running one of the examples with ASAN analysis enabled.
#!/bin/sh