RKVS - Rust Key-Value Storage
RKVS is a high-performance, in-memory, asynchronous key-value storage library for Rust. It is designed for concurrent applications and provides a thread-safe API built on Tokio.
Key features include:
- Namespaces: Isolate data into separate key-value stores, each with its own configuration for limits and behavior.
- Automatic Sharding: Keys are automatically distributed across internal shards using jump consistent hashing for improved concurrency under load.
- Concurrent Access: Optimized for high-throughput scenarios with support for multiple concurrent readers and writers, using
RwLockfor efficient read-heavy workloads. - Batch Operations: Perform atomic
set,get, anddeleteoperations on multiple items with "all-or-nothing" or "best-effort" semantics. - Optional Persistence: Save and load snapshots of the entire database or individual namespaces to disk.
- Rich API: Includes convenience methods like
consume(atomic get-and-delete) andupdate(fail if a key does not exist). - Configurable Autosave: Configure automatic background saving for the entire storage manager or for individual namespaces.
Basic Usage
RKVS is designed to be straightforward to use. Here's a quick overview of its core capabilities, including initialization, namespace management, single-key operations, batch operations, sharding, and persistence.
use ;
use Duration;
use temp_dir; // For temporary persistence path
async
Performance Overview
RKVS is designed for high-performance, in-memory key-value storage. Our benchmarks aim to illustrate its capabilities across various workloads and configurations, latest test results are available here. While exact numbers will vary based on hardware and specific test conditions, the general trends observed are:
-
Sequential Operations (e.g.,
get,set,delete):- Individual operations exhibit very low latency, typically in the single-digit to low double-digit microsecond range.
- Latency scales gracefully with increasing namespace size (from 1k to 1M keys), showing that the underlying data structures maintain efficiency even with large datasets.
existsoperations are generally the fastest, followed byget,set(update),update,consume, andset(insert) anddelete.
-
Sharding Overhead:
- Sharding effectively distributes load, leading to improved overall throughput and often reduced average latency for individual operations as the number of shards increases, up to an optimal point.
- The
jump_consistent_hashalgorithm ensures a relatively even distribution of keys across shards, minimizing hot spots and maximizing the benefits of concurrency. The deviation from perfect distribution remains low across various shard counts.
-
Concurrent Workloads (Mixed Read/Write):
- RKVS demonstrates strong performance under concurrent access, leveraging
RwLockfor efficient read-heavy scenarios and effective sharding for write-heavy or balanced workloads. - Throughput (operations per second) increases significantly with higher concurrency levels and appropriate shard counts.
- Average latency per operation remains stable or decreases for read-heavy workloads, and scales predictably for write-heavy workloads as concurrency and sharding are optimized.
- RKVS demonstrates strong performance under concurrent access, leveraging
-
Batch Operations (e.g.,
set_multiple,get_multiple):- Batch operations provide a substantial performance improvement by amortizing overhead across multiple key-value pairs.
- The average latency per item in a batch is significantly lower than performing individual operations, making batching highly recommended for bulk data manipulation.
BestEffortmode typically offers slightly lower latency thanAllOrNothingdue to reduced validation and rollback overhead, butAllOrNothingprovides stronger transactional guarantees.
-
Concurrent Batch Operations:
- Combining batching with concurrency yields very high throughput for bulk data operations under load.
- Latency per item remains low, even as multiple concurrent tasks perform batch operations, showcasing the efficiency of RKVS's concurrent design for large-scale data processing.
Benchmarks
RKVS includes a comprehensive suite of benchmarks to measure performance across various workloads. The results are saved as JSON files, and a Python script is provided to generate plots from these results.
Running the Benchmarks
The benchmarks are located in the benches/ directory and can be run using cargo bench. Each benchmark focuses on a different aspect of the system. The -- --nocapture flag is recommended to see live progress and results in the console.
-
Sequential Operations: Measures latency for individual
get,set,delete, etc., operations on namespaces of different sizes. -
Concurrent Workloads: Measures latency and throughput for mixed read/write workloads at different concurrency levels and shard counts.
-
Batch Operations: Measures latency for batch
set_multiple,get_multiple, anddelete_multipleoperations. -
Concurrent Batch Operations: Measures latency for concurrent batch operations.
-
Sharding Overhead: Measures the latency overhead of sharding for
getandsetoperations as the number of shards increases.
Running a benchmark will produce a .json result file in the assets/benchmarks/ directory.
Migration Guide
Upgrading from v0.1.0 to v0.2.0
The main breaking change is the switch from hash-based namespace identifiers to string-based namespace IDs:
Before (v0.1.0):
let ns_hash = storage.create_namespace.await?;
let namespace = storage.namespace.await?; // ns_hash was [u8; 32]
After (v0.2.0):
storage.create_namespace.await?;
storage.namespace.await?; // namespace_id is String
Key Changes:
create_namespace()now returns Ok(()) or Err() instead of[u8; 32]namespace()method now takes&strinstead of[u8; 32]- All other methods (
delete_namespace,get_namespace_stats, etc.) now use&strfor namespace identification - No more manual hash conversion needed
License
This project is licensed under the MIT License - see the LICENSE file for details.
Contributing
Contributions are welcome! Please feel free to submit a Pull Request.
Changelog
v0.3.0 (Latest)
- Core Feature: Implemented automatic sharding for namespaces.
- Persistence: Implemented automated background persistence for both the entire storage manager and individual namespaces.
- Batch Operations: Reworked batch operations for
set,get,delete, andconsumewithAllOrNothingandBestEffortmodes. - Benchmarking: Reworked the benchmarking suite to cover new features and provide more detailed performance metrics.
- Benchmarking: Added Python scripts for generating benchmark plots from results.
- API Consistency: Addressed several API inconsistencies for a more uniform user experience.
- Serialization: Improved serialization mechanisms for better performance and reliability during persistence.
- Data Structure: Reworked the base data structure for improved efficiency and concurrency.
- Documentation: Updated
README.mdwith a comprehensive project summary and a detailed "Basic Usage" section.
v0.2.0
- Breaking Change: Updated API to use string-based namespace IDs instead of hash values
- Performance: Switched from
MutextoRwLockfor better concurrent read performance, removed pointless hashing and data duplication - API Improvements: Simplified namespace ID handling - no more manual hash conversion needed
- Documentation: Updated all examples and documentation to reflect new API
- Concurrency: Improved read performance with multiple concurrent readers support
v0.1.0
- Initial release
- Namespace-based storage
- Async operations
- Batch processing
- File persistence
- Comprehensive benchmarking