Skip to main content

Module queue

Module queue 

Source
Expand description

Lock-free message queue implementation.

This module provides the core message queue abstraction used for communication between host and GPU kernels. The queue uses a ring buffer design with atomic operations for lock-free access.

§Cache-line padding

SPSC queue throughput under concurrent producer/consumer load is dominated by cache-line bouncing between cores. When head and tail live on the same cache line, every producer store(head) invalidates the consumer’s cached view of tail and vice versa — turning every operation into a forced cache-coherence round- trip. [CachePadded] places each hot field on its own 128-byte cache line (the widest modern line, covering both x86 spatial prefetching pairs and NVIDIA Hopper-era L2), so producer and consumer do not contend at the line granularity.

128 bytes is a conservative choice: AMD Zen 4 / Intel Sapphire Rapids use 64-byte lines but prefetch in pairs (the “destructive interference pair”), which std::sync::atomic::hint::spin_loop and crossbeam-utils both target with 128 bytes of padding.

Structs§

BoundedQueue
Bounded queue with blocking operations.
MpscQueue
Multi-producer single-consumer lock-free queue.
PartitionedQueue
A partitioned queue for reduced contention with multiple producers.
PartitionedQueueStats
Statistics for a partitioned queue.
QueueFactory
Factory for creating appropriately-sized message queues.
QueueMetrics
Comprehensive queue metrics snapshot.
QueueMonitor
Monitor for queue health and utilization.
QueueStats
Statistics for a message queue.
SpscQueue
Single-producer single-consumer lock-free ring buffer.

Enums§

QueueHealth
Queue health status from monitoring.
QueueTier
Queue capacity tiers for dynamic queue allocation.

Traits§

MessageQueue
Trait for message queue implementations.