small-map 0.1.5

An inline SIMD accelerated hashmap designed for small amount of data.
Documentation

Small-Map

ci crates.io docs.rs License License

An inline SIMD accelerated hashmap designed for small amount of data.

Usage

use small_map::FxSmallMap;
// Don't worry about the 32 here, it's the inline size, not the limitation.
// If the size grows more than 32, it will automatically switch to heap impl.
type MySmallMap<K, V> = FxSmallMap<32, K, V>;

let mut map = MySmallMap::new();
map.insert(1_u8, 2_u8);
assert_eq!(*map.get(&1).unwrap(), 2);

Usually you can use this map for short lifetime and small key/values, for example, http or RPC headers.

You can use it like other normal hashmaps without worrying about its size.

Choosing N

The choice of N involves a trade-off between inline storage benefits and struct size overhead:

Struct size = N * (1 + sizeof(K) + sizeof(V)) + 2 * sizeof(usize) + sizeof(Hasher)

Considerations

  1. Key/Value size: Larger K/V types increase struct size proportionally with N
  2. Pass frequency: If the map is frequently passed by value (moved/copied), a large struct hurts performance
  3. Expected element count: N should cover most use cases to avoid heap fallback

Tip: Choose N as a multiple of SIMD_WIDTH (16 on x86_64, 8 on aarch64) for optimal SIMD utilization. When uncertain, start with N = 16 or N = 32.

Performance

The performance of SmallMap depends on the capacity, Key/Value size and operation scenario.

It is recommended to set the size to 16(or 32) because with SSE2 it can search 16 items within a single instruction. It is only recommended to use SmallMap for small Key/Values, such as numbers and strings.

Map Performance (4-32)

Map Performance (4-64)

All Maps Performance (4-128)

[!NOTE] The lower the time(ns), the better the performance. You can find the benchmark source code in the benches folder.

Benchmark with u8 key/value on AMD Ryzen 9 7950X.

How it Works

Like SmallVec, for HashMap with a small amount of data, inlining the data can avoid heap allocation overhead and improve performance.

SmallMap stores key-value pairs in a contiguous inline array, with an additional control byte array storing the hash fingerprint for SIMD-accelerated lookup.

Lookup Strategy

The lookup strategy is determined at compile time based on N and at runtime based on current length:

  1. Linear Search (no hash computation): When N < SIMD_WIDTH or len < SIMD_WIDTH

    • Simply iterates through the array comparing keys directly
    • Avoids hash computation overhead for small maps
    • SIMD_WIDTH is 16 on x86_64 (SSE2) and 8 on aarch64 (NEON)
  2. SIMD Search (with h2 hash): When N >= SIMD_WIDTH and len >= SIMD_WIDTH

    • Computes h2 (8-bit hash fingerprint) from the key's hash
    • Uses SIMD to compare h2 against 16 or 8 control bytes in parallel
    • Only performs key comparison on h2 matches
  3. Heap Fallback: When len > N

    • Data is moved to a hashbrown-based heap HashMap
    • Performance is equivalent to hashbrown

Memory Layout

┌─────────────────────────────────────┐
│  AlignedGroups (N bytes)            │  ← h2 control bytes for SIMD
├─────────────────────────────────────┤
│  len: usize                         │
├─────────────────────────────────────┤
│  data: [(K, V); N]                  │  ← contiguous key-value pairs
└─────────────────────────────────────┘

Complexity

Let W = SIMD_WIDTH (16 on x86_64, 8 on aarch64)

Operation len ≤ W W < len ≤ N len > N
get O(len) O(len/W) O(1)*
insert O(len) O(len/W) O(1)*
insert_unique_unchecked O(1) O(1) O(1)*
remove O(len) O(len/W) O(1)*
iter O(len) O(len) O(capacity)

* amortized

Credit

Hashbrown is heavily referenced to and copied by this project, and is a very elegant and efficient implementation.