cesiumdb 0.1.0

Blazing fast, persistent key-value store for Rust
# CesiumDB Integration Tests

This directory contains large-scale integration tests for the compaction system.

## Running the Tests

These tests are marked with `#[ignore]` because they:
- Test with large datasets (up to 10GB)
- Take significant time to run
- Use substantial disk space

### Run all integration tests:
```bash
cargo test --test compaction_integration -- --ignored --test-threads=1
```

### Run a specific test:
```bash
cargo test --test compaction_integration test_10gb_write_heavy_workload -- --ignored
```

## Test Suite Overview

### 1. `test_10gb_write_heavy_workload`
- **Dataset**: 10 million entries × 1KB = ~10GB
- **Focus**: Write-heavy workload with periodic compaction
- **Validates**:
  - Large-scale data ingestion
  - Background compaction during writes
  - Data integrity after compaction
  - Write throughput metrics

### 2. `test_concurrent_reads_during_compaction`
- **Dataset**: 1 million entries × 1KB = ~1GB
- **Focus**: Concurrent read operations while compacting
- **Validates**:
  - Read consistency during compaction
  - No data corruption under concurrent access
  - Read throughput during compaction
  - Zero read errors

### 3. `test_mixed_workload_with_updates`
- **Dataset**: 1 million keys with 500k updates
- **Focus**: Mixed read/write workload with updates
- **Validates**:
  - Update correctness (newer values win)
  - Concurrent reads and writes
  - Compaction with duplicate keys
  - Mixed workload throughput

### 4. `test_deletion_with_compaction`
- **Dataset**: 500k entries with 50% deletions
- **Focus**: Delete operations and space reclamation
- **Validates**:
  - Tombstone handling
  - Space reclamation after compaction
  - Deleted keys return None
  - Remaining keys are intact

### 5. `test_background_compaction_triggers`
- **Dataset**: 200k entries
- **Focus**: Automatic background compaction
- **Validates**:
  - Background thread triggers compaction
  - Compaction jobs complete automatically
  - Data integrity after background compaction
  - Compaction statistics reporting

### 6. `test_point_lookup_performance`
- **Dataset**: 1 million entries
- **Focus**: Read performance before vs. after compaction
- **Validates**:
  - Read performance improvement after compaction
  - Point lookup correctness
  - Performance metrics
  - Compaction benefit quantification

## Batch API Optimization

All tests have been optimized to use the batch API for maximum performance:
- **Batch size**: 1000 operations per batch (writes) or 500 (mixed workloads)
- **Automatic memtable swapping**: The batch API automatically swaps memtables when full
- **Retry logic**: If a batch fails due to full memtable, it automatically swaps and retries

This provides:
- **10-100x faster** writes compared to individual `put()` calls
- Better stress testing of the batch code path
- More realistic production workload simulation

## Expected Results

### Performance Targets (Approximate)

With batch API optimization:
- **Write throughput**: 100k-500k writes/sec (varies by hardware, 10-100x faster than individual puts)
- **Read throughput**: 100k-500k reads/sec (before compaction)
- **Read throughput**: 200k-1M reads/sec (after compaction, 2-4x improvement)
- **Compaction time**: Varies by data size and hardware

### Data Integrity

All tests validate:
- ✓ Zero data corruption
- ✓ Zero read errors during compaction
- ✓ Correct handling of updates (latest version wins)
- ✓ Proper tombstone behavior for deletions
- ✓ Deterministic value verification

## System Requirements

### Minimum:
- **Disk Space**: 15GB free (for 10GB test data + overhead)
- **RAM**: 4GB
- **CPU**: 4 cores

### Recommended:
- **Disk Space**: 20GB free
- **RAM**: 8GB+
- **CPU**: 8+ cores
- **Storage**: NVMe SSD (for realistic performance)

## Test Output

Each test provides detailed progress output:

```
=== Testing 10GB Write-Heavy Workload ===
Writing 10000000 entries of 1KB each...
Progress: 100000/10000000 (1.0%) - 45231 writes/sec
Progress: 200000/10000000 (2.0%) - 48532 writes/sec
...
Write phase complete: 10000000 entries in 187.42s (53358 writes/sec)

Running final compaction...
Final compaction took 23.15s

Verifying data integrity (sampling 10k random entries)...
Verification complete: 10000 samples in 0.84s

Final Compaction Stats:
Compaction: queued=0, in_progress=0, completed=23, parallel_util=67%, pattern=WriteHeavy

=== Test Complete ===
Total time: 211.41s
Data written: ~10.00GB
```

## Troubleshooting

### Out of Disk Space
- Reduce test data size by modifying constants:
  ```rust
  const TARGET_ENTRIES: u64 = 1_000_000; // Instead of 10_000_000
  ```

### Tests Timeout
- Increase test timeout:
  ```bash
  RUST_TEST_TIMEOUT=600 cargo test --test compaction_integration -- --ignored
  ```

### Memory Issues
- Run with single thread to reduce memory pressure:
  ```bash
  cargo test --test compaction_integration -- --ignored --test-threads=1
  ```

### Performance Issues on HDD
- These tests are optimized for SSD
- On HDD, expect 10-50x slower performance
- Consider reducing data sizes for HDD testing

## Contributing

When adding new integration tests:
1. Mark with `#[ignore]` attribute
2. Add progress reporting for long-running tests
3. Include data integrity verification
4. Document expected behavior in test comments
5. Update this README with test description