Small-Map
An inline SIMD accelerated hashmap designed for small amount of data.
Usage
use FxSmallMap;
// Don't worry about the 32 here, it's the inline size, not the limitation.
// If the size grows more than 32, it will automatically switch to heap impl.
type MySmallMap<K, V> = ;
let mut map = new;
map.insert;
assert_eq!;
Usually you can use this map for short lifetime and small key/values, for example, http or RPC headers.
You can use it like other normal hashmaps without worrying about its size.
Choosing N
The choice of N involves a trade-off between inline storage benefits and struct size overhead:
Struct size = N * (1 + sizeof(K) + sizeof(V)) + 2 * sizeof(usize) + sizeof(Hasher)
Considerations
- Key/Value size: Larger K/V types increase struct size proportionally with N
- Pass frequency: If the map is frequently passed by value (moved/copied), a large struct hurts performance
- Expected element count: N should cover most use cases to avoid heap fallback
Tip: Choose N as a multiple of SIMD_WIDTH (16 on x86_64, 8 on aarch64) for optimal SIMD utilization. When uncertain, start with N = 16 or N = 32.
Performance
The performance of SmallMap depends on the capacity, Key/Value size and operation scenario.
It is recommended to set the size to 16(or 32) because with SSE2 it can search 16 items within a single instruction. It is only recommended to use SmallMap for small Key/Values, such as numbers and strings.
[!NOTE] The lower the time(ns), the better the performance. You can find the benchmark source code in the
benchesfolder.Benchmark with u8 key/value on AMD Ryzen 9 7950X.
How it Works
Like SmallVec, for HashMap with a small amount of data, inlining the data can avoid heap allocation overhead and improve performance.
SmallMap stores key-value pairs in a contiguous inline array, with an additional control byte array storing the hash fingerprint for SIMD-accelerated lookup.
Lookup Strategy
The lookup strategy is determined at compile time based on N and at runtime based on current length:
-
Linear Search (no hash computation): When
N < SIMD_WIDTHorlen < SIMD_WIDTH- Simply iterates through the array comparing keys directly
- Avoids hash computation overhead for small maps
- SIMD_WIDTH is 16 on x86_64 (SSE2) and 8 on aarch64 (NEON)
-
SIMD Search (with h2 hash): When
N >= SIMD_WIDTHandlen >= SIMD_WIDTH- Computes h2 (8-bit hash fingerprint) from the key's hash
- Uses SIMD to compare h2 against 16 or 8 control bytes in parallel
- Only performs key comparison on h2 matches
-
Heap Fallback: When
len > N- Data is moved to a hashbrown-based heap HashMap
- Performance is equivalent to hashbrown
Memory Layout
┌─────────────────────────────────────┐
│ AlignedGroups (N bytes) │ ← h2 control bytes for SIMD
├─────────────────────────────────────┤
│ len: usize │
├─────────────────────────────────────┤
│ data: [(K, V); N] │ ← contiguous key-value pairs
└─────────────────────────────────────┘
Complexity
Let W = SIMD_WIDTH (16 on x86_64, 8 on aarch64)
| Operation | len ≤ W | W < len ≤ N | len > N |
|---|---|---|---|
get |
O(len) | O(len/W) | O(1)* |
insert |
O(len) | O(len/W) | O(1)* |
insert_unique_unchecked |
O(1) | O(1) | O(1)* |
remove |
O(len) | O(len/W) | O(1)* |
iter |
O(len) | O(len) | O(capacity) |
* amortized
Credit
Hashbrown is heavily referenced to and copied by this project, and is a very elegant and efficient implementation.