featherdb-mvcc 1.0.0

# FeatherDB MVCC Layer

Multi-Version Concurrency Control (MVCC) provides snapshot isolation with single-writer, multiple-reader concurrency.

## Overview

MVCC allows multiple transactions to read the database concurrently while a single writer makes modifications, without blocking each other. Each transaction sees a consistent snapshot of the database as of its start time, enabling:

- **Non-blocking reads**: Readers never wait for writers
- **Non-blocking writes**: Writers never wait for readers
- **Consistent snapshots**: Each transaction sees a stable view of data
- **Conflict detection**: Write-write conflicts are detected and prevented

## Architecture

```
┌─────────────────────────────────────────────────────────────────────────┐
│                        TransactionManager                                │
│  - Assigns unique transaction IDs                                        │
│  - Tracks active and aborted transactions                                │
│  - Enforces single-writer semantics via write lock                       │
│  - Manages version store for old versions                                │
│  - Provides GC watermark based on oldest active transaction              │
└────────────────────────┬───────────────────────────────────────────────-┘
                         │
        ┌────────────────┼────────────────┬────────────────────┐
        │                │                │                    │
        ▼                ▼                ▼                    ▼
┌──────────────┐  ┌──────────────┐  ┌──────────────┐  ┌──────────────────┐
│ Transaction  │  │   Snapshot   │  │ VersionStore │  │  AbortedTxnSet   │
│              │  │              │  │              │  │                  │
│ - id         │  │ - txn_id     │  │ - versions   │  │ Tracks aborted   │
│ - mode       │  │ - hwm        │  │ - next_ptr   │  │ transaction IDs  │
│ - snapshot   │  │ - in_progress│  │              │  │ for visibility   │
│ - manager ref│  │ - ancestors  │  │              │  │ checks           │
└──────────────┘  └──────────────┘  └──────────────┘  └──────────────────┘
```

## Key Components

### 1. TransactionManager (`transaction.rs`)

Central coordinator for all transactions in the database.

```rust
pub struct TransactionManager {
    next_txn_id: AtomicU64,                    // ID generator
    active_txns: RwLock<HashMap<...>>,         // Active transaction tracking
    write_lock: Mutex<()>,                     // Single-writer enforcement
    oldest_active: AtomicU64,                  // GC watermark
    version_store: RwLock<VersionStore>,       // Old versions storage
    aborted_txns: RwLock<HashSet<TransactionId>>, // Aborted transaction IDs
}
```

**Responsibilities:**
- Assign unique, monotonically increasing transaction IDs
- Track active transactions for snapshot creation
- Track aborted transactions for visibility checks
- Enforce single-writer semantics via a mutex
- Provide the oldest active transaction ID for garbage collection
- Coordinate transaction commit and abort (with optional WAL integration)

### 2. Transaction (`transaction.rs`)

Represents a unit of work in the database.

```rust
pub struct Transaction<'a> {
    id: TransactionId,
    mode: TransactionMode,     // ReadOnly or ReadWrite
    snapshot: Snapshot,
    manager: &'a TransactionManager,
}
```

**Transaction Modes:**
- `TransactionMode::ReadOnly` - Cannot perform writes, returns error on write attempts
- `TransactionMode::ReadWrite` - Full read and write capability

**Key Features:**
- **Auto-abort on drop**: If a transaction is dropped without explicit commit, it is automatically aborted
- **WAL integration**: Commit and abort can optionally log to the Write-Ahead Log

### 3. Snapshot (`snapshot.rs`)

Represents a point-in-time view of the database for consistent reads.

```rust
pub struct Snapshot {
    txn_id: TransactionId,                      // This transaction's ID
    high_water_mark: TransactionId,             // All IDs >= this are invisible
    in_progress_at_start: HashSet<TransactionId>, // Transactions active at snapshot time
    visible_ancestors: HashSet<TransactionId>,  // For savepoint/sub-transaction support
}
```

**Visibility Rules:**

A transaction can see changes made by `created_by` if:

```rust
fn can_see(&self, created_by: TransactionId) -> bool {
    // 1. Own changes are always visible
    if created_by == self.txn_id { return true; }

    // 2. Ancestor transactions are visible (savepoint support)
    if self.visible_ancestors.contains(&created_by) { return true; }

    // 3. Future transactions are never visible
    if created_by >= self.high_water_mark { return false; }

    // 4. Transactions in-progress at snapshot start are invisible
    !self.in_progress_at_start.contains(&created_by)
}
```

**Row Visibility:**

A row is visible if we can see its creation AND cannot see its deletion:

```rust
fn is_visible(&self, created_by: TransactionId, deleted_by: Option<TransactionId>) -> bool {
    // Must see the creation
    if !self.can_see(created_by) { return false; }

    // If deleted, must NOT see the deletion
    if let Some(deleted_by) = deleted_by {
        if self.can_see(deleted_by) { return false; }
    }

    true
}
```

### 4. Version Chain (`version.rs`)

Stores multiple versions of data for MVCC reads.

**VersionedValue** - Current (latest) version with MVCC metadata:

```rust
pub struct VersionedValue {
    current: Vec<Value>,               // Latest version data
    created_by: TransactionId,         // Transaction that created this version
    deleted_by: Option<TransactionId>, // Transaction that deleted this row (if any)
    version_chain: Option<VersionPtr>, // Link to older versions
}
```

**OldVersion** - Historical versions in the version chain:

```rust
pub struct OldVersion {
    data: Vec<Value>,
    created_by: TransactionId,
    deleted_by: Option<TransactionId>,
    prev_version: Option<VersionPtr>,  // Link to even older version
}
```

**Version Chain Structure:**

```
Current Version ──→ OldVersion₁ ──→ OldVersion₂ ──→ ... ──→ None
    (newest)                                            (oldest)
```

**VersionChain Iterator:**

The `VersionChain` struct provides an iterator for walking the version chain and finding visible versions:

```rust
let mut chain = VersionChain::new(&version_store, start_ptr);
if let Some(visible_version) = chain.find_visible(&snapshot) {
    // Use the visible version
}
```

### 5. Garbage Collection

Removes old versions that no active transaction can see.

**GC Rules:**
- A version is garbage if `deleted_by < oldest_active_txn`
- This means all active (and future) transactions can see the deletion
- The old version will never be needed again

```rust
pub fn gc(&mut self, oldest_active: Option<TransactionId>) -> GcStats {
    // If no active transactions, collect all deleted versions
    // Otherwise, collect versions where deleted_by < oldest_active
}
```

**GcStats:**
```rust
pub struct GcStats {
    versions_removed: usize,  // Number of versions collected
    bytes_freed: usize,       // Approximate memory freed
}
```

**Running GC:**
```rust
let stats = tm.run_gc();
println!("Removed {} versions, freed ~{} bytes",
         stats.versions_removed, stats.bytes_freed);
```

## Isolation Level

The MVCC implementation provides **Snapshot Isolation**:

| Property | Behavior |
|----------|----------|
| Dirty Reads | Prevented - uncommitted changes are invisible |
| Non-repeatable Reads | Prevented - snapshot is fixed at transaction start |
| Phantom Reads | Prevented - snapshot is fixed at transaction start |
| Write Skew | Possible - not prevented by snapshot isolation |

**Note:** Snapshot isolation is weaker than Serializable isolation. Write skew anomalies can occur when two transactions read overlapping data and make disjoint writes based on what they read.

## Write-Write Conflict Detection

Write-write conflicts are prevented through single-writer serialization:

```rust
// Only one read-write transaction can hold the write lock at a time
pub fn acquire_write_lock(&self) -> impl Drop + '_ {
    self.write_lock.lock()
}
```

Conflict detection at the row level:
```rust
// When attempting to modify a row, check if it was modified after our snapshot
if !self.snapshot.can_see(current_version.created_by) {
    // Another transaction modified this row after our snapshot started
    return Err(Error::WriteConflict);
}
```

## Aborted Transaction Handling

The implementation tracks aborted transactions to ensure their changes remain invisible:

```rust
// Check visibility accounting for aborted transactions
pub fn is_row_visible(
    &self,
    snapshot: &Snapshot,
    created_by: TransactionId,
    deleted_by: Option<TransactionId>,
) -> bool {
    // If creator was aborted, row is invisible
    if self.is_aborted(created_by) { return false; }

    // If deleter was aborted, treat as not deleted
    let effective_deleted_by = match deleted_by {
        Some(del_by) if self.is_aborted(del_by) => None,
        other => other,
    };

    snapshot.is_visible(created_by, effective_deleted_by)
}
```

Cleanup of old aborted transaction IDs:
```rust
// Remove aborted txn IDs older than any active transaction's snapshot
tm.cleanup_aborted_txns(older_than_txn_id);
```

## Transaction Lifecycle

```
1. BEGIN
   ├── Assign unique TransactionId
   ├── Create Snapshot (capture high_water_mark, in_progress set)
   └── Register in active_txns map

2. READ (can repeat)
   ├── Check current version visibility using snapshot
   ├── If not visible, walk version chain
   └── Return first visible version (or not found)

3. WRITE (can repeat, ReadWrite mode only)
   ├── Acquire write lock (blocks other writers)
   ├── Check for write-write conflict
   ├── Create new version with our txn_id as created_by
   └── Move old version to version chain

4. COMMIT or ABORT
   ├── COMMIT:
   │   ├── Log to WAL (if provided)
   │   ├── Remove from active_txns
   │   └── Update oldest_active watermark
   └── ABORT:
       ├── Log to WAL (if provided)
       ├── Remove from active_txns
       ├── Add to aborted_txns set
       └── Update oldest_active watermark

5. DROP (if not committed/aborted)
   └── Auto-abort the transaction
```

## Usage Examples

### Basic Transaction Usage

```rust
use featherdb_mvcc::{TransactionManager, TransactionMode};

let tm = TransactionManager::new();

// Start a read-write transaction
let txn = tm.begin(TransactionMode::ReadWrite);
println!("Transaction ID: {:?}", txn.id());

// Check if we can write (will error for read-only transactions)
txn.check_write()?;

// Check visibility of another transaction's changes
if txn.can_see(some_txn_id) {
    // Changes from some_txn_id are visible
}

// Commit the transaction
txn.commit()?;
```

### Read-Only Transaction

```rust
let txn = tm.begin_read_only();

// Read operations here...

// Attempting to write will fail
assert!(txn.check_write().is_err());

// Read-only transactions can commit or abort
txn.commit()?;
```

### Snapshot Visibility Check

```rust
let snapshot = txn.snapshot();

// Check if a version is visible
if snapshot.is_visible(created_by, deleted_by) {
    // The row is visible to this transaction
}
```

### Running Garbage Collection

```rust
// Run GC manually
let stats = tm.run_gc();
println!("GC removed {} versions, ~{} bytes freed",
         stats.versions_removed, stats.bytes_freed);

// Check version store size
println!("Remaining old versions: {}", tm.version_count());
println!("Active transactions: {}", tm.active_count());
println!("Tracked aborted transactions: {}", tm.aborted_count());
```

### WAL Integration

```rust
use featherdb_storage::Wal;

let mut wal = Wal::open("data/wal")?;
let txn = tm.begin_read_write();

// ... perform operations ...

// Commit with WAL logging
txn.commit_with_wal(&mut wal)?;

// Or abort with WAL logging
// txn.abort_with_wal(&mut wal)?;
```

### Savepoint/Sub-transaction Support

```rust
use std::collections::HashSet;

// Create a snapshot with visible ancestors (for sub-transactions)
let mut ancestors = HashSet::new();
ancestors.insert(parent_txn_id);

let snapshot = Snapshot::with_ancestors(
    child_txn_id,
    high_water_mark,
    in_progress,
    ancestors,
);

// The child snapshot can now see the parent's uncommitted changes
assert!(snapshot.can_see(parent_txn_id));
```

## Concurrency Guarantees

| Operation | Behavior |
|-----------|----------|
| Read-Read | No blocking - concurrent reads always allowed |
| Read-Write | No blocking - readers see their snapshot |
| Write-Read | No blocking - writer's uncommitted changes invisible to readers |
| Write-Write | Serialized - single writer lock prevents concurrent writes |

## Configuration Options

Currently, the MVCC layer uses sensible defaults. Configurable options include:

| Option | Default | Description |
|--------|---------|-------------|
| Initial TxnID | 1 | Starting transaction ID |
| Write Lock | Mutex | Single-writer enforcement |
| Version Store | In-memory HashMap | Storage for old versions |

## Performance Considerations

1. **Version Chain Length**: Long chains slow down reads as they require traversal. Run GC regularly to keep chains short.

2. **Active Transaction Count**: Long-running transactions prevent GC from reclaiming old versions. Keep transactions short.

3. **Write Lock Contention**: All writes are serialized through a single lock. This simplifies implementation but can bottleneck write-heavy workloads.

4. **Aborted Transaction Set**: The aborted transaction set grows until cleanup. Call `cleanup_aborted_txns()` periodically.

5. **Snapshot Memory**: Each snapshot stores the set of in-progress transactions. Many concurrent transactions increase memory usage.

## Testing

```bash
# Run all MVCC tests (using Make)
make test-crate CRATE=featherdb-mvcc

# Or with cargo directly
cargo test -p featherdb-mvcc

# Test specific components
cargo test -p featherdb-mvcc snapshot
cargo test -p featherdb-mvcc transaction
cargo test -p featherdb-mvcc version

# Run with output
cargo test -p featherdb-mvcc -- --nocapture

# Run coverage (from project root)
make coverage  # or: cargo llvm-cov --workspace
```

## Transaction Timeout

Transactions can be configured with automatic timeouts to prevent long-running transactions from blocking the system.

### Configuration

```rust
use featherdb_core::TransactionConfig;
use std::time::Duration;

// Create config with 5 second timeout
let config = TransactionConfig::new()
    .with_timeout(Duration::from_secs(5).as_millis() as u64);

let txn = tm.begin_with_config(config);
```

### Timeout Behavior

When a transaction exceeds its configured timeout:
- All subsequent operations fail with `Error::TransactionTimeout`
- The transaction must be rolled back
- Timeout is checked automatically on every operation

### Timeout API

```rust
// Check if transaction has expired
if txn.is_expired() {
    println!("Transaction timed out!");
}

// Get remaining time before timeout
if let Some(remaining) = txn.remaining_time() {
    println!("{}ms remaining", remaining.as_millis());
}

// Get elapsed time since transaction started
let elapsed = txn.elapsed();
println!("Running for {}ms", elapsed.as_millis());
```

### Error Handling

```rust
match txn.execute("UPDATE accounts SET balance = 100") {
    Ok(_) => println!("Success"),
    Err(Error::TransactionTimeout { transaction_id, elapsed_ms, timeout_ms }) => {
        println!("Transaction {} timed out after {}ms (limit: {}ms)",
                 transaction_id, elapsed_ms, timeout_ms);
        txn.rollback()?;
    },
    Err(e) => println!("Other error: {}", e),
}
```

## Deadlock Detection

Deadlock detection prevents circular wait conditions between transactions using cycle detection in a wait-for graph.

### How It Works

1. **Wait Graph**: Tracks which transactions are waiting for which other transactions
2. **Cycle Detection**: Uses DFS to detect cycles in the wait graph
3. **Victim Selection**: Aborts the youngest transaction (highest ID) in the cycle
4. **Automatic Resolution**: Deadlocks are detected and resolved automatically

### WaitGraph API

```rust
use featherdb_mvcc::WaitGraph;

let mut graph = WaitGraph::new();

// Record that txn1 is waiting for txn2
graph.add_wait(txn1, txn2);
graph.add_wait(txn2, txn3);
graph.add_wait(txn3, txn1); // Creates a cycle

// Detect the cycle
if let Some(cycle) = graph.detect_cycle() {
    println!("Deadlock detected: {:?}", cycle);
    let victim = graph.select_victim(&cycle);
    println!("Aborting transaction {}", victim);
}

// Clean up when transaction completes
graph.remove_transaction(txn1);
```

### Integration Example

```rust
// Deadlock detection is integrated into the transaction manager
// When a transaction waits for a lock:
wait_graph.add_wait(waiter_txn, holder_txn);

if let Some(cycle) = wait_graph.detect_cycle() {
    let victim = wait_graph.select_victim(&cycle);
    // Abort the victim transaction
    tm.abort(victim)?;
    return Err(Error::Deadlock);
}
```

## Long-Running Transaction Detection

Monitor long-running transactions to identify potential GC blockers and performance issues.

### Detection API

```rust
use std::time::Duration;

// Find transactions running longer than 5 seconds
let long_running = tm.long_running_transactions(Duration::from_secs(5));

for txn_info in &long_running {
    println!("Transaction {}: running for {:?}", txn_info.id, txn_info.age);
}
```

### GC Status Monitoring

```rust
// Check if GC is blocked
let gc_status = tm.gc_status();

if gc_status.blocked {
    println!("GC blocked by transaction {:?}", gc_status.oldest_active_txn);
    println!("Reason: {:?}", gc_status.blocked_reason);
}
```

### TransactionInfo

```rust
pub struct TransactionInfo {
    pub id: u64,           // Transaction ID
    pub age: Duration,     // How long it has been running
}
```

### GcStatus

```rust
pub struct GcStatus {
    pub oldest_active_txn: Option<u64>,        // Oldest transaction blocking GC
    pub blocked: bool,                         // Whether GC is blocked
    pub blocked_reason: Option<String>,        // Human-readable reason
}
```

### Use Cases

**Database Monitoring:**
```rust
// Periodic health check
let long_txns = tm.long_running_transactions(Duration::from_secs(5));
if !long_txns.is_empty() {
    warn!("Found {} long-running transactions", long_txns.len());
}
```

**GC Health Check:**
```rust
let status = tm.gc_status();
if status.blocked {
    info!("GC blocked: {}", status.blocked_reason.unwrap());
    // Take action: notify admin, log warning, etc.
}
```

**Performance Analysis:**
```rust
// Correlate long-running transactions with GC blocking
let long_running = tm.long_running_transactions(Duration::from_secs(1));
let gc_status = tm.gc_status();

if let Some(oldest) = gc_status.oldest_active_txn {
    if long_running.iter().any(|t| t.id == oldest) {
        warn!("Long-running transaction {} is blocking GC!", oldest);
    }
}
```

## Adaptive Garbage Collection

Adaptive GC automatically adjusts collection frequency based on workload patterns and metrics.

### Configuration

```rust
use featherdb_mvcc::gc::{GcConfig, GcScheduler};

let config = GcConfig {
    min_interval_ms: 1000,           // Minimum 1 second between GC runs
    max_versions_before_gc: 10000,   // Trigger GC at 10k versions
    adaptive: true,                  // Enable adaptive scheduling
};

let mut scheduler = GcScheduler::new(config);
```

### GC Metrics

```rust
// Get current metrics
let metrics = scheduler.metrics().snapshot();

println!("Versions created: {}", metrics.versions_created);
println!("Versions cleaned: {}", metrics.versions_cleaned);
println!("GC runs: {}", metrics.gc_runs);
println!("GC time: {}μs", metrics.gc_time_us);
println!("Pending: {}", metrics.versions_pending);
```

### Adaptive Scheduling

```rust
// Check if GC should run based on adaptive heuristics
if scheduler.should_run_gc() {
    let start = Instant::now();
    let stats = tm.run_gc();
    scheduler.record_gc_run(start.elapsed());
    scheduler.record_versions_cleaned(stats.versions_removed as u64);
}

// Record version creation
scheduler.metrics().record_version_created();
```

### How Adaptive GC Works

1. **Metrics Tracking**: Monitors versions created, cleaned, and pending
2. **Heuristics**: Decides when to run GC based on:
   - Time since last GC run (respects min_interval)
   - Number of pending versions (respects max_versions_before_gc)
   - Historical GC efficiency
3. **Performance**: Prevents unnecessary GC runs when system is idle
4. **Safety**: Ensures GC runs when memory pressure is high

## Future Improvements

- [ ] First-committer-wins write-write conflict detection
- [ ] Serializable isolation level (prevent write skew)
- [ ] Multi-version indexes
- [ ] Background GC thread with configurable interval
- [ ] Read-only transaction optimization (skip active tracking)