Xutex — High‑Performance Hybrid Mutex
Xutex is a high-performance mutex that seamlessly bridges synchronous and asynchronous Rust code with a single type and unified internal representation. Designed for extremely low-latency lock acquisition under minimal contention, it achieves near-zero overhead on the fast path while remaining runtime-agnostic.
Key Features
- ⚡ Blazing-fast async performance: Up to 50× faster than standard sync mutexes in single-threaded async runtimes, and 3–5× faster in multi-threaded async runtime under extreme contention.
- 🔄 Hybrid API: Use the same lock in both sync and async contexts
- ⚡ 8-byte lock state: Single
AtomicPtron 64-bit platforms (guarded data stored separately) - 🚀 Zero-allocation fast path: Lock acquisition requires no heap allocation when uncontended
- ♻️ Smart allocation reuse: Object pooling minimizes allocations under contention
- 🎯 Runtime-agnostic: Works with Tokio, async-std, monoio, or any executor using
std::task::Waker - 🔒 Lock-free fast path: Single CAS operation for uncontended acquisition
- 📦 Minimal footprint: Compact state representation with lazy queue allocation
- 🛡️ No-std compatible: Fully compatible with
no_stdenvironments, relying only oncoreandalloc
Installation
[]
= "0.2"
Or via cargo:
# with std
# for no-std environments
Quick Start
Synchronous Usage
Asynchronous Usage
use AsyncMutex;
async
Hybrid Usage
Convert seamlessly between sync and async:
use Mutex;
async
Performance Characteristics
Why It's Fast
-
Atomic state machine: Three states encoded in a single pointer:
UNLOCKED(null): Lock is freeLOCKED(sentinel): Lock held, no waitersUPDATING: Queue initialization in progressQUEUE_PTR: Lock held with waiting tasks/threads
-
Lock-free fast path: Uncontended acquisition uses a single
compare_exchange -
Lazy queue allocation: Wait queue created only when contention occurs
-
Pointer tagging: LSB tagging prevents race conditions during queue modifications
-
Stack-allocated waiters:
Signalnodes live on the stack, forming an intrusive linked list -
Optimized memory ordering: Careful use of
Acquire/Releasesemantics -
Adaptive backoff: Exponential backoff reduces cache thrashing under contention
-
Minimal heap allocation: At most one allocation per contended lock via pooled queue reuse, additional waiters require zero allocations
Benchmarks
Run benchmarks on your machine:
Expected Performance (varies by hardware):
- Uncontended: ~1-3ns per lock/unlock cycle (single CAS operation)
- High contention: 2-3× faster than
tokio::sync::Mutexin async contexts - Sync contexts: Performance comparable to
std::sync::Mutexwith minimal overhead from queue pointer checks under high contention; matchesparking_lotperformance in low-contention scenarios
Design Deep Dive
Architecture
┌─────────────────────────────────────────────┐
│ Mutex<T> / AsyncMutex<T> │
│ ┌───────────────────────────────────────┐ │
│ │ MutexInternal<T> │ │
│ │ • queue: AtomicPtr<QueueStructure> │ │
│ │ • inner: UnsafeCell<T> │ │
│ └───────────────────────────────────────┘ │
└─────────────────────────────────────────────┘
│
├─ UNLOCKED (null) ──────────────► Lock available
│
├─ LOCKED (sentinel) ─────────────► Lock held, no waiters
│
└─ Queue pointer ─────────────────► Lock held, waiters queued
│
▼
┌─────────────────┐
│ SignalQueue │
│ (linked list) │
└─────────────────┘
│
▼
┌─────────────────┐ ┌─────────────────┐
│ Signal │────►│ Signal │────► ...
│ • waker │ │ • waker │
│ • value │ │ • value │
└─────────────────┘ └─────────────────┘
Signal States
Each waiter tracks its state through atomic transitions:
SIGNAL_UNINIT (0): Initial stateSIGNAL_INIT_WAITING (1): Enqueued and waitingSIGNAL_SIGNALED (2): Lock grantedSIGNAL_RETURNED (!0): Guard has been returned
Thread Safety
- Public API: 100% safe Rust
- Internal implementation: Carefully controlled
unsafeblocks for:- Queue manipulation (pointer tagging prevents use-after-free)
- Guard creation (guaranteed by state machine)
- Memory ordering (documented and audited)
API Reference
Mutex<T>
| Method | Description |
|---|---|
new(data: T) |
Create a new synchronous mutex |
lock() |
Acquire the lock (blocks current thread) |
try_lock() |
Attempt non-blocking acquisition |
lock_async() |
Acquire asynchronously (returns Future) |
as_async() |
View as &AsyncMutex<T> |
to_async() |
Convert to AsyncMutex<T> |
to_async_arc() |
Convert Arc<Mutex<T>> to Arc<AsyncMutex<T>> |
AsyncMutex<T>
| Method | Description |
|---|---|
new(data: T) |
Create a new asynchronous mutex |
lock() |
Acquire the lock (returns Future) |
try_lock() |
Attempt non-blocking acquisition |
lock_sync() |
Acquire synchronously (blocks current thread) |
as_sync() |
View as &Mutex<T> |
to_sync() |
Convert to Mutex<T> |
to_sync_arc() |
Convert Arc<AsyncMutex<T>> to Arc<Mutex<T>> |
MutexGuard<'a, T>
Implements Deref<Target = T> and DerefMut for transparent access to the protected data. Automatically releases the lock on drop.
Use Cases
✅ Ideal For
- High-frequency, low-contention async locks
- Hybrid applications mixing sync and async code
- Performance-critical sections with short critical regions
- Runtime-agnostic async libraries
- Situations requiring zero-allocation fast paths
⚠️ Not Ideal For
- Predominantly synchronous workloads: In pure sync environments without async interaction,
std::sync::Mutexmay offer slightly better performance due to lower abstraction overhead - Read-heavy workloads: If your use case involves frequent reads with infrequent writes, consider using
RwLockimplementations (e.g.,std::sync::RwLockortokio::sync::RwLock) that allow multiple concurrent readers - Mutex poison state: Cases where
std::sync::Mutexpoisoning semantics are required
Caveats
- 8-byte claim: Refers to lock metadata only on 64-bit platforms; guarded data
Tstored separately - No poisoning: Unlike
std::sync::Mutex, panics don't poison the lock - Sync overhead: Slight performance cost vs
std::sync::Mutexin pure-sync scenarios (~1-5%)
Testing
Run the test suite:
# Standard tests
# With Miri (undefined behavior detection)
# Benchmarks
TODO
- Implement
RwLockvariant with shared/exclusive locking - Explore lock-free linked list implementation for improved wait queue performance
Contributing
Contributions are welcome! Please:
- Run
cargo +nightly fmtandcargo clippybefore submitting - Add tests for new functionality
- Update documentation as needed
- Verify
cargo miri testpasses - Note: This library is
no-stdcompatible; usecoreandallocinstead ofstd. Ensurecargo testandcargo test --no-default-featuresrun without warnings.
License
Licensed under the MIT License.
Author: Khashayar Fereidani
Repository: github.com/fereidani/xutex