compact_strings
A more compact but limited representation of a list of strings or bytestrings.
This crate is not affiliated with compact_str and does not perform the same optimizations.
About
Vec<u8>, which also backs String, are 3 pointer-widths each.
They consist of a pointer to the start of the data, a length denoting how many elements
are currently in the Vec, and a capacity denoting the number of elements the
allocation pointed to can hold.
The capacity may not be needed when stored in a list structure, especially when the (byte)strings are immutable. The pointer also causes indirection to memory unlikely to be cache-local.
This crate instead stores lists of (byte)strings as two vectors:
- Metadata - which holds the starting indexes of the (byte)strings
- Data - which holds the actual bytes of the (byte)strings
This means that we pay an upfront cost of 6 pointer-widths compared to just 3, but save a pointer-width for every string after the third, in addition to (hopefully) better cache-friendliness.
Unfortunately, this structure makes mutating (byte)strings stored in the data vector
extremely difficult without shifting the rest of the data around.
This could be worked around with a limited API for mutation, but the cost of
moving the rest of the bytes will be much higher than with a Vec<String>.
See benchmarks for more details.
Benchmarks
Some benchmarks of operations expected to perform vastly differently from their
Vec equivalents have been benchmarked, you can view them here