FlashMap
GPU-native concurrent hash map for bulk operations. Designed for workloads where you need to look up, insert, or remove millions of keys in a single call — not one at a time.
Why FlashMap?
Traditional hash maps (std HashMap, DashMap) process keys sequentially or with CPU thread-level parallelism. FlashMap launches a CUDA kernel where each GPU thread handles one key — achieving massive parallelism on modern GPUs.
| Operation | DashMap (16-core) | FlashMap (H100) |
|---|---|---|
| 1M inserts | ~500ms | ~60ms |
| 1M lookups | ~300ms | ~35ms |
Quick Start
use FlashMap;
// Create a map with capacity for 1M entries
let mut map: =
with_capacity.unwrap;
// Insert 100K key-value pairs in one GPU kernel launch
let pairs: = generate_pairs;
map.bulk_insert.unwrap;
// Look up all keys at once
let keys: = pairs.iter.map.collect;
let results: = map.bulk_get.unwrap;
// Remove a batch of keys
map.bulk_remove.unwrap;
Features
cuda— GPU backend via CUDA (requires NVIDIA GPU + CUDA 12+ toolkit)cpu-fallback— CPU backend for development/testing (enabled by default)
# CPU only (default, works everywhere)
= "0.1"
# GPU acceleration
= { = "0.1", = ["cuda"] }
# Both (tries GPU first, falls back to CPU)
= { = "0.1", = ["cuda", "cpu-fallback"] }
Applications
Blockchain State Storage
Store account state (pubkey → account data) with bulk commit after block execution. FlashMap's fixed-size key/value constraint maps directly to blockchain account models where keys are 32-byte public keys and values are fixed-size account structs.
use FlashMap;
type Pubkey = ;
type AccountData = ;
let mut state: =
with_capacity.unwrap;
// After executing a block of 100K transactions,
// commit all state changes in one GPU call
let changes: = execute_block;
state.bulk_insert.unwrap;
High-Frequency Trading / Order Books
Batch-update order book entries. Price levels and order quantities change in bursts — FlashMap processes an entire tick's worth of updates in a single kernel launch instead of one-by-one mutex-locked inserts.
use FlashMap;
// OrderId (u64) → OrderEntry (price + qty + side + timestamp)
let mut book: = with_capacity.unwrap;
// Process all order updates from a single market data tick
let updates: = parse_tick_updates;
book.bulk_insert.unwrap;
// Cancel batch of orders
let cancels: = parse_cancellations;
book.bulk_remove.unwrap;
Network Packet Deduplication
Deduplicate packets by hash in high-throughput network pipelines. At 100Gbps+, per-packet hash table lookups on CPU become a bottleneck — FlashMap processes an entire batch of packet hashes on GPU.
use FlashMap;
type PacketHash = ;
type Marker = u64; // timestamp or sequence number
let mut seen: =
with_capacity.unwrap;
// Check which packets in this batch are duplicates
let hashes: = batch.iter.map.collect;
let results = seen.bulk_get.unwrap;
// Insert new (non-duplicate) packets
let new_packets: = hashes.iter
.zip
.filter
.map
.collect;
seen.bulk_insert.unwrap;
GPU-Accelerated Databases
Use as a GPU-resident index for in-memory databases. Traditional B-tree or hash indexes live in CPU memory — FlashMap keeps the index on GPU, eliminating PCIe round-trips for query-heavy workloads.
use FlashMap;
type RowId = u64;
type IndexKey = ; // hashed column value
let mut index: =
with_capacity.unwrap;
// Bulk-load index from table scan
let entries: = table.iter
.map
.collect;
index.bulk_insert.unwrap;
// Batch point lookups (e.g., JOIN probe side)
let probe_keys: = probe_table.iter
.map
.collect;
let matches = index.bulk_get.unwrap;
Genomics / Bioinformatics
k-mer counting and sequence matching. Genomic analysis involves billions of short DNA subsequences (k-mers) that need to be counted or looked up — a naturally batch-parallel workload.
use FlashMap;
// 32-mer encoded as 8 bytes (2 bits per nucleotide)
type Kmer = u64;
type Count = u64;
let mut kmer_counts: =
with_capacity.unwrap;
// Insert k-mers from a batch of sequence reads
let kmers: = extract_kmers;
kmer_counts.bulk_insert.unwrap;
// Query which k-mers from a target sequence exist
let query_kmers: = extract_query_kmers;
let hits = kmer_counts.bulk_get.unwrap;
Design
Architecture
FlashMap<K, V>
│
┌──────────┴──────────┐
│ │
GPU Backend CPU Backend
(cuda feature) (cpu-fallback feature)
│
┌─────────┼─────────┐
│ │ │
d_keys d_flags d_values ← SoA layout on GPU
[u8] [u32] [u8]
SoA (Struct of Arrays): Keys, flags, and values are stored in separate contiguous GPU buffers. This gives coalesced memory access when all threads read flags simultaneously, then keys, then values — instead of strided access through interleaved AoS records.
Linear probing with power-of-2 capacity and bitmask modulo. Cache-friendly on both CPU and GPU.
Identity hash (default): Interprets the first 8 bytes of the key as a u64. Zero compute overhead for keys that are already well-distributed (SHA256 digests, Ed25519 public keys, UUIDs).
MurmurHash3: Available via the builder for keys with poor distribution (sequential integers, low-entropy prefixes).
Constraints
- Fixed-size keys and values: Both
KandVmust implementbytemuck::Pod(plain old data —Copy, fixed layout, any bit pattern valid). NoString,Vec, or heap-allocated types. - Bulk-only API: No single-key
get/insert/remove. Wrap in a 1-element slice if needed. - 50% max load factor: The table must have at least 2x the capacity of your data. This keeps probe chains short for GPU performance.
- No duplicate keys per batch: If the same key appears twice in a single
bulk_insertcall, behavior is non-deterministic on GPU (one will win).
Configuration
use ;
let map: = builder
.hash_strategy // default: Identity
.device_id // default: 0 (first GPU)
.force_cpu // skip GPU even if available
.build
.unwrap;
Benchmarks
Run on your hardware:
# CPU fallback
# GPU (requires CUDA 12+)
License
Dual-licensed under MIT or Apache-2.0.