iridium-db 0.2.0

A high-performance vector-graph hybrid storage and indexing engine
# Iridium Architecture

Iridium is a storage engine for hybrid graph + vector workloads, optimized for fast traversals, efficient delta updates, and low write amplification on NVMe-backed single-node systems.

## System Overview

```
┌──────────────────────────────────────────────────────────────┐
│                        Client Layer                          │
│              RustDriver  ·  gRPC (proto/iridium/v1)          │
└───────────────────────────┬──────────────────────────────────┘
┌───────────────────────────▼──────────────────────────────────┐
│                    Query + Runtime Layer                      │
│   parse() → validate() → explain() / execute() / fanout()   │
└──────────┬──────────────────────────────┬────────────────────┘
           │                              │
┌──────────▼──────────┐       ┌──────────▼────────────────────┐
│   Ingest Pipeline   │       │     Ops & Observability       │
│  IngestPipeline     │       │  BackgroundJobTracker·Metrics │
└──────────┬──────────┘       └───────────────────────────────┘
┌──────────▼──────────────────────────────────────────────────┐
│                      Storage Engine                          │
│                                                              │
│  MemTable → WAL → L0 SSTables → L1/L2/L3+ (compaction)     │
│  BufferPool (LRU-2)  ·  Manifest  ·  BitmapStore            │
│  HNSW Maintenance Scheduler                                  │
└──────────┬──────────────────────────────────────────────────┘
┌──────────▼──────────────────────────────────────────────────┐
│                      Core Primitives                         │
│      Reactor (I/O trait)  ·  Topology (shard routing)       │
└─────────────────────────────────────────────────────────────┘
```

## Modules

| Module | Path | Description |
|--------|------|-------------|
| [storage]api/storage.md | `src/features/storage` | LSM storage engine — reads, writes, compaction, WAL, bitmap indexes |
| [ingest]api/ingest.md | `src/features/ingest` | Real-time ingestion pipeline with batching and backpressure |
| [runtime]api/runtime.md | `src/features/runtime` | Query execution engine (volcano-style operators, fanout) |
| [query]api/query.md | `src/features/query` | Query parser and semantic validator |
| [ops]api/ops.md | `src/features/ops` | Background jobs, health, and metrics |
| [client]api/client.md | `src/features/client` | Embedded Rust driver and gRPC contract |
| [reactor]api/reactor.md | `src/core/reactor` | I/O abstraction trait enabling deterministic testing |
| [topology]api/topology.md | `src/core/topology` | Shard routing for thread-per-core deployments |

## Key Design Decisions

### LSM with Hybrid Compaction
Writes go to a WAL and an in-memory MemTable. Flushes produce L0 SSTables. Compaction uses a tiered policy for L0 (high write throughput) and leveled for L1+ (lower space amplification and predictable reads).

### Graph-Aware Entry Types
SSTables are polymorphic: `FullNode` entries store adjacency lists, `EdgeDelta` entries accumulate incremental edge changes, `VectorDelta` entries carry embedding updates, and `Tombstone` entries mark deletions. Compaction merges deltas into full nodes based on delta count per node, not just file size.

### Zero-Copy Payloads
FlatBuffers are used for variable-length payloads on disk and in memory, avoiding deserialization overhead on the read path.

### Reactor Pattern
All I/O is routed through the `Reactor` trait. `SystemReactor` calls real OS APIs. `DeterministicReactor` advances a fixed clock and produces repeatable random values, enabling fully deterministic integration tests.

### Single-Writer Invariant
`StorageHandle` is single-writer. The data directory is protected by a lock file acquired on `open_store`. Thread-per-core sharding is modeled via `topology::shard_for_node` — each core owns a disjoint shard of node IDs and holds its own `StorageHandle`.

### Buffer Pool
An 8 KiB page cache with LRU-2 eviction (tracks the two most-recent access times to resist one-time scans evicting hot pages) and explicit pin/unpin for pages that must stay resident.

### Bitmap Indexes
Roaring bitmaps provide inverted indexes over node properties. Multiple named indexes can coexist. Posting lists survive across restarts via the `BitmapStore`.

### HNSW Vector Index
A separate LSM-backed HNSW index is maintained alongside the main storage. The `HnswMaintenanceScheduler` triggers a rebuild when the fraction of updated vectors exceeds a configurable threshold (default 5%).

## Data Flow: Write Path

```
ingest_event()
  └── validate event
  └── queue event
  └── flush_once() when queue reaches max_batch_size
        └── put_full_node() / put_edge_deltas_batch() / put_vector_delta()
              └── WAL append (durability)
              └── MemTable insert
              └── flush MemTable → L0 SSTable when full
                    └── compact() merges L0 → L1+ asynchronously
```

## Data Flow: Read Path

```
execute(handle, params)
  └── planner selects physical ops (BitmapScan / VectorScan / NodeScan)
  └── scan node IDs in [scan_start, scan_end_exclusive)
  └── for each node: get_logical_node()
        └── check logical_node_cache (LRU)
        └── probe MemTable
        └── probe L0 SSTables (Bloom filter → fence pointer → binary search)
        └── merge FullNode + EdgeDeltas into LogicalNode
  └── apply Filter / Project / Limit operators
  └── return RowStream
```

## Verification

```bash
cargo test              # all tests (requires Zig for vector kernels)
cargo test --no-default-features  # skip Zig-compiled kernels
npm run lint            # lint
```