Crate starshard

Crate starshard 

Source
Expand description

Starshard: a high-performance, lazily sharded concurrent HashMap.

§Features

  • async: enables AsyncShardedHashMap backed by tokio::sync::RwLock.
  • rayon: enables parallel snapshot flattening inside iteration for large maps (sync + async).
  • serde: (sync) serialize/deserialize via a stable map snapshot; async map via snapshot helper.

§Serde Semantics

ShardedHashMap:

  • Serialized form: { shard_count: usize, entries: Vec<(K,V)> }.
  • Hasher state is not preserved; deserialization rebuilds with S::default().
  • Requires K: Eq + Hash + Clone + Send + Sync + Serialize + Deserialize, V: Clone + Send + Sync + Serialize + Deserialize, S: BuildHasher + Clone + Send + Sync + Default.

AsyncShardedHashMap:

  • No direct Serialize/Deserialize (locks need await).
  • Use async_snapshot_serializable().await to obtain a snapshot wrapper implementing Serialize.
  • To rebuild: create a new async map, then bulk insert entries.

§Design Goals

  1. Minimize contention via sharding (coarse dynamic set of RwLocks).
  2. Lazy shard materialization to reduce cold-start memory (slots are None until touched).
  3. O(1) (amortized) length via atomic counter (fast cloning, no full scan).
  4. Parallel iteration using rayon if enabled (snapshots each shard then flattens).
  5. Async version mirrors sync semantics; attempts optimistic try_read first to reduce await points.
  6. Predictable memory layout leveraging hashbrown::HashMap and user-supplied hasher.

§Consistency Model

  • Per-shard operations are linearizable with respect to that shard.
  • Global iteration is snapshot-per-shard at the moment each shard lock is taken: You may see entries inserted/removed concurrently in other shards.
  • len() is eventually consistent only in the trivial sense of atomic monotonic increments/decrements: It reflects completed inserts/removes; in-flight operations not yet applied are invisible.

§Thread / Task Safety

  • Each shard guarded by a single RwLock (Std or Tokio).
  • No nested acquisition of multiple shard locks (avoids lock order deadlocks).
  • Atomic length update only after a structural insert/delete succeeds.
  • Clone bounds on K,V needed for iteration snapshot flattening.

§Performance Notes (Indicative, not guaranteed)

  • Read-heavy sync workloads: sharding reduces write interference vs a single map + RwLock.
  • rayon speeds large aggregate scans (e.g. metrics dump, checkpoint) 3-4x on >100k elements.
  • Lazy shards: memory roughly proportional to number of distinct shard indices used.

§Hasher Choice

  • Default FxBuildHasher: speed oriented (non-cryptographic).
  • For DoS / adversarial key defense use: std::collections::hash_map::RandomState. Example:
    use starshard::ShardedHashMap;
    use std::collections::hash_map::RandomState;
    let map: ShardedHashMap<String, u64, RandomState> =
        ShardedHashMap::with_shards_and_hasher(128, RandomState::default());

§Limitations

  • No dynamic shard rebalancing / rehash across shards yet.
  • No eviction / TTL (can be layered externally).
  • Iteration allocates temporary vectors proportional to initialized shards (to snapshot).
  • Not lock-free; large writer pressure can still cause convoying on hot shards.

§Future Extension Ideas

  • Optional background shard growth / rebalancing.
  • Configurable eviction: LRU per shard / clock / segmented queue.
  • Metrics hooks (pre/post op).
  • Batched mutation (multi-insert with single lock acquisition per target shard).
  • Optional copy-on-write snapshots for near-zero iteration locking windows.

§Examples

Sync (default features):

use starshard::ShardedHashMap;
use rustc_hash::FxBuildHasher;

let map: ShardedHashMap<String, i32, FxBuildHasher> = ShardedHashMap::new(64);
map.insert("a".into(), 1);
assert_eq!(map.get(&"a".into()), Some(1));
assert_eq!(map.len(), 1);

Async (enable async feature):

#[cfg(feature = "async")]
#[tokio::main]
async fn main() {
    use starshard::AsyncShardedHashMap;
    let map: AsyncShardedHashMap<String, i32> = AsyncShardedHashMap::new(64);
    map.insert("k".into(), 7).await;
    assert_eq!(map.get(&"k".into()).await, Some(7));
}

Parallel iteration (enable rayon):

use starshard::ShardedHashMap;
let map: ShardedHashMap<String, u32> = ShardedHashMap::new(32);
for i in 0..10_000 {
    map.insert(format!("k{i}"), i);
}
let count = map.iter().count(); // internally parallel if `rayon` feature active
assert_eq!(count, 10_000);

Async + Rayon (enable async,rayon):

#[cfg(all(feature="async", feature="rayon"))]
#[tokio::main]
async fn main() {
    use starshard::AsyncShardedHashMap;
    let m: AsyncShardedHashMap<u32, u32> = AsyncShardedHashMap::new(64);
    for i in 0..1000 { m.insert(i, i*i).await; }
    let items = m.iter().await; // flattens in parallel internally
    assert_eq!(items.len(), 1000);
}

Custom hasher (RandomState):

use starshard::ShardedHashMap;
use std::collections::hash_map::RandomState;
let secure: ShardedHashMap<String, i64, RandomState> =
    ShardedHashMap::with_shards_and_hasher(64, RandomState::default());
secure.insert("x".into(), 1);

Structs§

AsyncShardedHashMapasync
Asynchronous sharded concurrent HashMap (Tokio RwLock).
AsyncShardedHashMapSnapshotasync and serde
A serializable snapshot of an AsyncShardedHashMap.
ShardedHashMap
Sharded concurrent HashMap (synchronous).

Constants§

DEFAULT_SHARDS
Default shard count (power-of-two not required; hashing modulo used).