emdb 0.7.1 - Docs.rs

# Benchmarking emdb

This project ships four Criterion / raw-timing benches:

- [`benches/kv.rs`](../benches/kv.rs) — focused micro-bench for the
  `insert` / `get` / `remove` hot paths, ephemeral DB, single thread.
- [`benches/comparative.rs`](../benches/comparative.rs) — emdb vs.
  `redb` / `sled`, single-thread workload, default value sizes.
- [`benches/concurrent_reads.rs`](../benches/concurrent_reads.rs) —
  multi-thread read fan-out (1 / 2 / 4 / 8 reader threads) against a
  pre-populated DB, showcases the lock-free `Arc<Mmap>` read path.
- [`benches/lmdb_style.rs`](../benches/lmdb_style.rs) —
  apples-to-apples mirror of redb's published `lmdb_benchmark.rs`
  workload (5 M records, 24-byte random keys, 150-byte random values,
  full phase set including bulk load, individual writes, batch writes,
  nosync writes, random reads × 2, MT reads at 4 / 8 threads,
  removals, uncompacted / compacted size). Range-read phases are
  recorded as `N/A` for emdb because the hash index does not support
  sorted iteration unless `EmdbBuilder::enable_range_scans(true)` is
  set.

## Quick runs

```powershell
# emdb-only micro-benches
cargo bench --bench kv --features ttl

# emdb vs sled vs redb at default scale (20 K records)
cargo bench --bench comparative --features ttl,bench-compare

# Read scaling under thread fan-out
cargo bench --bench concurrent_reads --features ttl

# Apples-to-apples vs redb's published methodology (5 M records)
$env:EMDB_BENCH_RECORDS = "5000000"
cargo bench --bench lmdb_style --features ttl,bench-compare
```

`EMDB_BENCH_RECORDS` defaults to 1 M for `lmdb_style` and 20 K for
`comparative`. Override the env var to scale up or down.

## Optional comparison peers

Add additional engines via Cargo features:

```powershell
# RocksDB
cargo bench --bench comparative --features ttl,bench-compare,bench-rocksdb

# Redis (network protocol; requires a running Redis server)
docker run --rm -p 6379:6379 redis:7   # in another terminal
$env:EMDB_REDIS_URL = "redis://127.0.0.1/"
cargo bench --bench comparative --features ttl,bench-redis
```

Redis numbers include network overhead and are not directly
comparable to embedded engines on absolute latency.

## Reading results

Criterion writes detailed reports to
[target/criterion](../target/criterion). Key metrics:

- Throughput (elements/sec)
- Time per iteration (mean, p50, p95)
- Relative comparison between benchmark IDs in the same group

The `lmdb_style` bench prints a markdown table to stdout with
millisecond totals per phase per engine. The `concurrent_reads` bench
is Criterion-driven and reports throughput per thread count.

## Reproducibility notes

For meaningful comparisons:

- Use the release profile via `cargo bench` (the dev profile is
  ~5-10× slower).
- Close heavy background apps; pin the CPU power mode if possible.
- Run multiple times and compare medians, not single-best values.
- Keep dataset size and value size identical when comparing engines.
- The `lmdb_style` bench seeds a deterministic RNG, so the same
  dataset is generated across runs and across engines within a run.

## Suggested reporting format

When logging benchmark output into README or release notes, record:

- CPU and memory
- OS + Rust version
- Command used
- Record count (`EMDB_BENCH_RECORDS`)
- Wall-time per phase, or throughput per phase, for each engine
- Any notable caveats (network overhead, hardware differences,
  background load)

## Reference baseline (5 M records, Windows 11 NVMe)

Captured with:

```powershell
$env:EMDB_BENCH_RECORDS = "5000000"
cargo bench --bench lmdb_style --features ttl,bench-compare
```

Lower is better; numbers are wall-time milliseconds (or bytes for the
size rows).

| phase                       |        emdb |    redb  |    sled  |
|-----------------------------|------------:|---------:|---------:|
| bulk load                   |    **4498** |    74496 |    60807 |
| batch writes                |    **2814** |    11043 |     1972 |
| nosync writes               |     **220** |     1717 |     1136 |
| random reads (1 M)          |     **596** |     5289 |    11197 |
| random reads (4 threads)    |    **1083** |    17543 |    34605 |
| random reads (8 threads)    |     **653** |    17160 |    33284 |
| removals                    |   **11948** |    54905 |    46155 |
| compaction                  |   **11490** |    16506 |      N/A |
| uncompacted size            |    1.08 GiB | 4.00 GiB | 2.13 GiB |
| compacted size              | **498 MiB** | 1.64 GiB |      N/A |
| individual writes (fsync/op)|       27455 |  **734** |  **316** |
| random range reads          |         N/A |     3958 |     9688 |

Notes:

- emdb wins every aggregate-throughput phase, often by an order of
  magnitude. Aggregate read throughput at 8 threads is **~7.66 M
  reads/sec**.
- The `individual writes` phase syncs after every record. emdb pays
  one Windows `FlushFileBuffers` per write, which dominates the
  result. Workloads needing per-record durability should batch via
  `db.transaction(...)` or `db.insert_many(...)` (the `batch writes`
  and `bulk load` phases).
- `random range reads` is N/A because emdb's primary index is
  hash-keyed. Set `EmdbBuilder::enable_range_scans(true)` to enable
  the opt-in BTreeMap secondary index.

These numbers are workload- and hardware-specific. Reproduce on your
target deployment for decision making.