Mappy: Space-Efficient Maplet Data Structures

A Rust implementation of maplets - space-efficient data structures for approximate key-value mappings, based on the research paper "Time To Replace Your Filter: How Maplets Simplify System Design".
Overview
Maplets provide the same space benefits as filters while natively supporting key-value associations with one-sided error guarantees. Unlike traditional filters that only support set membership queries, maplets allow you to associate values with keys and retrieve them during queries.
Key Features
- Space Efficiency: Achieves
O(log 1/ε + v)bits per item where ε is the false-positive rate and v is the value size - Value Support: Native key-value associations with configurable merge operators
- One-Sided Errors: Guarantees
M[k] ≺ m[k]for application-specific ordering relations - Deletion Support: Full support for removing key-value pairs
- Merging: Combine maplets using associative/commutative operators
- Resizing: Dynamic growth with efficient rehashing
- Cache Locality: Optimized memory layout for performance
- Concurrency: Thread-safe operations with lock-free reads
- Quotient Filter: Enhanced slot finding with run detection and shifting support
Architecture
The implementation follows a multi-crate workspace structure:
- mappy-core: Core maplet data structure implementation
- mappy-client: Client library for Rust applications
- mappy-python: Python bindings via PyO3
- mappy-server: HTTP server for network access to Mappy service
Quick Start
use ;
// Create a maplet for counting with 1% false-positive rate
let mut maplet = new;
// Insert key-value pairs
maplet.insert.unwrap;
maplet.insert.unwrap;
// Query values (may return approximate results)
let count1 = maplet.query; // Some(5) or Some(5 + other_values)
let count2 = maplet.query; // Some(3) or Some(3 + other_values)
// Check if key exists
let exists = maplet.contains; // true
// Get statistics
let stats = maplet.stats;
println!;
println!;
Advanced Features
Quotient Filter
Enable the quotient-filter feature for enhanced slot finding capabilities with run detection, shifting support, and comprehensive benchmarking:
[]
= { = "0.1.0", = ["quotient-filter"] }
use ;
// Create a maplet with quotient filter support
let maplet = new.unwrap;
// Insert some data
maplet.insert.await.unwrap;
// Find the actual slot where a key's fingerprint is stored
// This handles runs, shifting, and all complex quotient filter logic
let slot = maplet.find_slot_for_key.await;
match slot
Quotient Filter Benefits:
- Precise Slot Finding: Locate exact storage slots considering runs and shifting
- Run Detection: Handle quotient filter runs with multiple fingerprints
- Shifting Support: Account for linear probing and slot shifting
- Debugging Support: Inspect internal storage layout for optimization
- Performance Analysis: Understand memory access patterns and cache behavior
- Comprehensive Testing: 62+ test cases covering all edge cases and scenarios
- Performance Benchmarks: Detailed benchmarks showing 10-60M operations/second
- Python Integration: Full Python bindings for quotient filter features
Comprehensive Testing & Benchmarking
The quotient filter implementation includes extensive testing and benchmarking infrastructure:
Test Coverage (62+ Tests)
- Basic Operations: Insert, query, delete with various data types
- False Positive Rate: Validation of probabilistic accuracy
- Multiset Operations: Counter and aggregation operations
- Run Detection: Advanced slot finding with run handling
- Capacity Management: Load factor and resizing behavior
- Concurrency: Thread-safe operations and race condition testing
- Edge Cases: Boundary conditions and error scenarios
- Hash Functions: AHash, TwoX, Fnv performance comparison
- Memory Usage: Space efficiency and memory optimization
- Advanced Features: Slot finding, run detection, shifting support
Performance Benchmarks
- Insert Performance: 10-17 million operations/second
- Query Performance: 16-45 million operations/second
- Delete Performance: 4-8 million operations/second
- Slot Finding: 24-61 million operations/second
- Hash Function Comparison: AHash (fastest), TwoX (medium), Fnv (slowest)
- Memory Usage: Linear scaling with efficient space utilization
- Load Factor Impact: Performance analysis across different load factors
Running Tests and Benchmarks
# Run all tests with quotient filter features
# Run comprehensive test suite
# Run specific benchmarks
# Run complete test suite with Python support
Python Integration
Full Python bindings are available for the quotient filter features:
# Create Maplet with quotient filter features
=
= # Returns slot index
# Create Engine with quotient filter features
=
=
= # Returns slot index
Python Features:
- Slot Finding:
find_slot_for_key()method for both Maplet and Engine - Error Handling: Proper Python exceptions for invalid operations
- Performance: Same high-performance as Rust implementation
- Concurrency: Thread-safe operations in Python
- Memory Management: Automatic cleanup and resource management
Use Cases
1. K-mer Counting (Computational Biology)
use ;
let mut kmer_counter = new;
// Count k-mers in DNA sequences with high accuracy
2. Network Routing Tables
use ;
let mut routing_table = new;
// Map network prefixes to sets of next-hop routers
3. LSM Storage Engine Index
use ;
let mut sstable_index = new;
// Map keys to SSTable identifiers for efficient lookups
Performance Characteristics
Benchmark Results: Mappy vs Redis
Our comprehensive benchmarks show Mappy significantly outperforms Redis for approximate key-value operations:
| Dataset Size | Operation | Redis Performance | Mappy Performance | Mappy Advantage |
|---|---|---|---|---|
| 100 items | SET/INSERT | 13.9-18.0 ms | 41.9-47.7 µs | ~300x faster |
| 1,000 items | SET/INSERT | 107-130 ms | 414-481 µs | ~250x faster |
| 10,000 items | SET/INSERT | 976-1,244 ms | 4.9-5.7 ms | ~200x faster |
Performance Highlights
- Query Throughput: 200-300x faster than Redis for insert operations
- Memory Efficiency: Space-efficient design with configurable false-positive rates
- Error Control: False-positive rate within 1.5x of configured ε
- Cache Performance: Optimized for sequential access patterns
- Concurrent Access: Thread-safe operations with stable performance under load
ML Benchmarks: Machine Learning with Compressed Tags
Mappy's approximate nature has been proven not to hurt ML performance when using Huffman-compressed tags. Comprehensive benchmarks demonstrate production-ready performance for all ML tasks:
| Task | Accuracy | Speed Ratio | Status | Improvement |
|---|---|---|---|---|
| Similarity Search | 100.00% | 0.72x | ✅ Faster than exact! | 50.8x faster |
| Embeddings | 100.00% | 1.13x | ✅ Excellent! | 1894x faster |
| Classification | 92.50% | 1.08x | ✅ Excellent! | Stable |
| Clustering | 34.00%* | 2.42x | ✅ Acceptable | Stable |
*Clustering accuracy varies by test data, but exact and approximate match perfectly.
Key Findings:
- ✅ Perfect accuracy preservation - All tasks maintain exact accuracy with approximate storage
- ✅ No degradation - Mappy's 1% false positive rate doesn't impact ML task accuracy
- ✅ Production-ready - All 4 ML tasks achieve excellent or acceptable performance
- ✅ Optimization techniques - Cache-first strategy and vocabulary caching eliminate bottlenecks
Optimization Strategy: The critical optimization was moving mappy operations out of the hot path:
- Setup Phase: Store in mappy, pre-decompress into cache
- Hot Path: Only ML computation (similarity, embedding, etc.)
- Result: Minimal overhead, excellent performance
Storage Efficiency:
- 90% storage reduction with Huffman compression
- Memory usage: 8.74% of original size
- Compression ratio: 0.0977 (9.77% of original)
For detailed ML benchmark results and optimization techniques, see the Stilts ML Benchmarks documentation.
Running Benchmarks
# Run Redis comparison benchmarks
# Run ML benchmarks (requires stilts package)
# Run all benchmarks
# Run specific benchmark categories
Error Guarantees
Maplets provide the strong maplet property:
m[k] = M[k] ⊕ (⊕ᵢ₌₁ˡ M[kᵢ])
Where Pr[ℓ ≥ L] ≤ ε^L, meaning even when wrong, the result is close to correct.
Documentation
- Technical Documentation - Comprehensive technical guide with architecture diagrams, API reference, and implementation details
- Quotient Filter - Complete guide to quotient filter features, testing, and Python integration
- Documentation Index - Comprehensive index of all documentation and resources
- API Reference - Complete API reference and usage examples
- Testing Guide - Comprehensive testing guide for Python bindings
- Benchmark Documentation - Detailed benchmark results and performance analysis
- API Documentation - Auto-generated API documentation
License
MIT License - see LICENSE file for details.
References
Based on the research paper:
Bender, M. A., Conway, A., Farach-Colton, M., Johnson, R., & Pandey, P. (2025). Time To Replace Your Filter: How Maplets Simplify System Design. arXiv preprint arXiv:2510.05518.
Visual Guide: Maplets and Transformers

This comic strip illustrates the fundamental parallel between maplets and transformer attention mechanisms. Just as maplets use local, overlapping key-value pairs that merge to form a global data structure, transformer models process information through attention heads that combine local views into coherent representations. The fox's journey through the forest represents the iterative process of building understanding from discrete, local pieces. Whether navigating a literal forest or the abstract manifold of high-dimensional data. Both systems achieve global comprehension by systematically processing and merging local information, demonstrating that the principle of "understand the local, and the global reveals itself" applies equally to space-efficient data structures and neural network architectures.