Expand description
Lock-free message queue implementation.
This module provides the core message queue abstraction used for communication between host and GPU kernels. The queue uses a ring buffer design with atomic operations for lock-free access.
§Cache-line padding
SPSC queue throughput under concurrent producer/consumer load is
dominated by cache-line bouncing between cores. When head and
tail live on the same cache line, every producer store(head)
invalidates the consumer’s cached view of tail and vice versa
— turning every operation into a forced cache-coherence round-
trip. [CachePadded] places each hot field on its own 128-byte
cache line (the widest modern line, covering both x86 spatial
prefetching pairs and NVIDIA Hopper-era L2), so producer and
consumer do not contend at the line granularity.
128 bytes is a conservative choice: AMD Zen 4 / Intel Sapphire
Rapids use 64-byte lines but prefetch in pairs (the “destructive
interference pair”), which std::sync::atomic::hint::spin_loop
and crossbeam-utils both target with 128 bytes of padding.
Structs§
- Bounded
Queue - Bounded queue with blocking operations.
- Mpsc
Queue - Multi-producer single-consumer lock-free queue.
- Partitioned
Queue - A partitioned queue for reduced contention with multiple producers.
- Partitioned
Queue Stats - Statistics for a partitioned queue.
- Queue
Factory - Factory for creating appropriately-sized message queues.
- Queue
Metrics - Comprehensive queue metrics snapshot.
- Queue
Monitor - Monitor for queue health and utilization.
- Queue
Stats - Statistics for a message queue.
- Spsc
Queue - Single-producer single-consumer lock-free ring buffer.
Enums§
- Queue
Health - Queue health status from monitoring.
- Queue
Tier - Queue capacity tiers for dynamic queue allocation.
Traits§
- Message
Queue - Trait for message queue implementations.