kira_kv_engine 0.1.1

KV-storage engine based on minimal perfect hash functions with hybrid indexing (+PGM Index) for Rust
Documentation

โšก Kira KV Engine

Crates.io Downloads (recent)

KV-storage engine based on Minimal perfect hash functions with hybrid indexing (+PGM Index) for Rust.
Zero collisions. Pure O(1) performance.

๐Ÿง  Algorithm

Built on the BDZ algorithm using 3-hypergraph peeling:

  • Maps exactly n keys to n unique indices
  • Uses hypergraph construction with 3 vertices per key
  • Peels vertices of degree 1 iteratively until fully resolved
  • Falls back to rehashing with different salts if cycles occur

๐Ÿ“– Research: Simple and Space-Efficient Minimal Perfect Hash Functions (Botelho, Pagh, Ziviani, 2007)

For numeric data, we layer in PGM (Piecewise Geometric Model) indexing:

  • Learns linear segments to approximate key distributions
  • Provides O(log log n) worst-case with O(1) average lookups
  • Excellent for sorted integer sequences with predictable patterns

๐Ÿ“– Research: The PGM-index: a fully-dynamic compressed learned index (Ferragina, Vinciguerra, 2020)

๐Ÿš€ Quick Start

use kira_kv_engine::Builder;

fn main() -> Result<(), Box<dyn std::error::Error>> {
    // Build perfect hash for string keys
    let keys = ["rust", "zig", "go", "swift", "kotlin"];
    let mph = Builder::new().build(keys.iter().map(|s| s.as_bytes()))?;
    
    // O(1) lookups, guaranteed unique indices
    for key in &keys {
        println!("{} โ†’ {}", key, mph.index(key.as_bytes()));
    }
    
    Ok(())
}

๐ŸŽฏ Hybrid Indexing

For mixed workloads, use the hybrid index that automatically partitions keys:

use kira_kv_engine::HybridBuilder;

fn main() -> Result<(), Box<dyn std::error::Error>> {
    let mixed_keys = vec![
        1000u64.to_le_bytes().to_vec(),  // numeric key
        2000u64.to_le_bytes().to_vec(),  // numeric key  
        b"user:123".to_vec(),            // string key
        b"session:abc".to_vec(),         // string key
    ];
    
    let hybrid = HybridBuilder::new().build(mixed_keys)?;
    
    // Query by original key format
    println!("Numeric: {}", hybrid.index_u64(1000)?);
    println!("String: {}", hybrid.index_str("user:123")?);
    
    // Range queries for numeric keys
    let range_results = hybrid.range_u64(1000, 3000);
    println!("Range [1000,3000]: {:?}", range_results);
    
    Ok(())
}

๐Ÿ“Š Performance Comparison *

Performance benchmarks on 1M keys (Intel Core i7-12700, Windows 11):

Solution Build Time Build Rate Lookup Time Lookup Rate Memory Use Case
Kira (MPH+PGM) 395ms โšก 2.5M/s โšก 26ns โšก 37.9M/s โšก 4.2MB โšก Mixed data
Kira (MPH) 977ms 1.0M/s 47ns โšก 21.4M/s โšก 5.0MB โšก String keys
Redis Hash 2,100ms 0.48M/s 180ns 5.6M/s 90MB General KV
LevelDB 3,400ms 0.29M/s 250ns 4.0M/s 120MB Persistent
RocksDB 2,800ms 0.36M/s 200ns 5.0M/s 110MB LSM-tree
std::HashMap 850ms 1.2M/s 35ns 28.6M/s 96MB In-memory
DashMap 920ms 1.1M/s 42ns 23.8M/s 102MB Concurrent
BTreeMap 1,200ms 0.83M/s 65ns 15.4M/s 72MB Sorted

* our benchmarks

๐Ÿ† Performance Highlights *

Build Speed:

  • Kira Hybrid: 5.3ร— faster than Redis, 8.6ร— faster than LevelDB
  • Kira MPH: Competitive with HashMap, 2.1ร— faster than Redis

Lookup Speed:

  • Kira Hybrid: 6.9ร— faster than Redis, 9.6ร— faster than LevelDB
  • Kira MPH: 3.8ร— faster than Redis, 5.3ร— faster than LevelDB

Memory Efficiency:

  • Kira: 18-50ร— smaller than traditional databases
  • Kira: 19-24ร— smaller than HashMap/concurrent structures
  • Zero fragmentation - exact memory requirements known

* our benchmarks

โšก Speed Comparison Matrix *

Metric vs Redis vs LevelDB vs RocksDB vs HashMap vs BTreeMap
Build ๐Ÿ”ฅ 5.3ร— ๐Ÿ”ฅ 8.6ร— ๐Ÿ”ฅ 7.1ร— ๐Ÿ”ฅ 2.2ร— ๐Ÿ”ฅ 3.0ร—
Lookup ๐Ÿ”ฅ 6.9ร— ๐Ÿ”ฅ 9.6ร— ๐Ÿ”ฅ 7.7ร— โšก 1.3ร— ๐Ÿ”ฅ 2.5ร—
Memory ๐Ÿ”ฅ 21ร— ๐Ÿ”ฅ 29ร— ๐Ÿ”ฅ 26ร— ๐Ÿ”ฅ 23ร— ๐Ÿ”ฅ 17ร—

* our benchmarks

๐Ÿงช Benchmarks

Run comprehensive benchmarks across algorithms:

cargo run --example million_build --release

Expected output on modern hardware:

=== kira_kv_engine :: Hybrid MPH+PGM Benchmark ===
๐Ÿ”ฅ TRADITIONAL MPH BENCHMARK
build:     0.977 s   (1.0 M keys/s)
lookup:    0.047 s   (21.4 M lookups/s)
Memory: 5.0 bytes/key, 24x compression vs HashMap

๐Ÿš€ HYBRID MPH+PGM BENCHMARK  
build:     0.395 s   (2.5 M keys/s)
lookup:    0.026 s   (37.9 M lookups/s)

๐Ÿ“Š PGM-ONLY BENCHMARK
build:     0.016 s   (62.7 M keys/s)
lookup:    0.022 s   (45.5 M lookups/s)
range:     0.781 s   (1,281 queries/s)

โš™๏ธ Features

Enable optional optimizations:

[dependencies]

kira_kv_engine = { version = "0.1", features = ["serde", "parallel", "pgm", "simd"] }

  • serde โ€” Serialization support for persistence
  • parallel โ€” Multi-threaded construction via rayon
  • pgm โ€” Learned indexing for numeric keys
  • simd โ€” Vectorized operations where available

๐ŸŽ›๏ธ Configuration

Fine-tune for your workload:

use kira_kv_engine::{Builder, BuildConfig};
/// MPH Only (for strings as a key)
let mph = Builder::new()
    .with_config(BuildConfig {
        gamma: 1.25,           // Lower = denser graph, more retries
        rehash_limit: 16,      // Max attempts before giving up
        salt: 0xDEADBEEF,     // Custom base salt
    })
    .build(keys)?;

๐Ÿ“Š When to Use

Perfect for:

  • Fixed datasets (config files, catalogs, dictionaries)
  • Memory-constrained environments
  • Predictable O(1) lookup requirements
  • Cold start optimization (mmap friendly)
  • High-performance lookups in hot paths

Not suitable for:

  • Frequently changing key sets
  • Unknown dataset sizes
  • Write-heavy workloads

๐Ÿš€ Performance Tips

  • Use gamma โ‰ˆ 1.25 for datasets > 10M keys
  • Enable parallel feature for build-time speedup
  • Serialize indices to disk for instant cold starts
  • Consider hybrid indexing for mixed numeric/string data

๐ŸŽฏ Real-World Use Cases *

Web Servers:

  • Route matching: 6.9ร— faster than Redis lookups
  • Session lookup: 37.9M operations/sec vs Redis 5.6M

Databases:

  • Index structures: 29ร— less memory than LevelDB
  • Cache lookups: Zero allocation after build

Gaming:

  • Asset ID lookup: 21.4M lookups/sec
  • Player data: Instant cold start from disk

    * provided with our performance benchmarks

๐Ÿ“š References