# Database Module
A content-addressed, incremental compilation database designed for language servers and build systems. This database provides snapshot isolation, automatic dependency tracking, and adaptive query parallelization for high-performance concurrent workloads.
## Table of Contents
- [Overview](#overview)
- [Core Concepts](#core-concepts)
- [Architecture](#architecture)
- [Writing Data](#writing-data)
- [Reading Data](#reading-data)
- [Advanced Topics](#advanced-topics)
- [Common Patterns](#common-patterns)
- [Important Conventions](#important-conventions)
- [Testing](#testing)
- [File Structure](#file-structure)
## Overview
### Purpose
This database is purpose-built for incremental compilation systems and language servers where:
- **Incremental recompilation** requires tracking which computations depend on which inputs
- **Concurrent reads** must see consistent snapshots while writes are happening
- **Query performance** needs to adapt based on data distribution
- **Cache invalidation** must be precise and automatic via content addressing
### Design Goals
1. **Incremental Compilation**: Content-addressed chunks enable automatic cache invalidation - when source changes, its content hash changes, creating a new chunk rather than replacing the old one.
2. **Snapshot Isolation**: Readers see a consistent view of the database at a point in time, isolated from concurrent writes.
3. **Dependency Tracking**: Every query automatically tracks which chunks were read, building a DAG from source files through compilation stages.
4. **Adaptive Performance**: The system measures sequential vs parallel query execution and automatically chooses the faster approach for each query pattern.
5. **Thread Safety**: Lock-free concurrent access via DashMap, cheap cloning via Arc.
### When to Use This Database
**Use this database when you need:**
- Incremental recompilation with automatic cache invalidation
- Dependency tracking to rebuild only what changed
- Concurrent reads with snapshot isolation
- Integration with async task systems
**Don't use this database when you need:**
- Traditional ACID transactions with rollback
- Mutable records (everything here is immutable)
- SQL queries or relational joins
- Persistent storage (this is an in-memory database)
## Core Concepts
### Content-Addressed Storage
Chunks are identified by their content hash, not by location or name. This has important implications:
**Immutability**: Once a chunk is created, it never changes. This is enforced in `mod.rs:55-62`:
```rust
// Chunks are immutable once created
// If you need to update data, create a new chunk with different content
// The new content will have a different hash, so it will be a different chunk
```
**Automatic Deduplication**: If two computations produce identical results, they get the same `ChunkId` and share storage.
**No Replacement**: When source code changes, parsing it creates a new chunk with a different hash. The old chunk remains until garbage collected. This is why incremental compilation works - you can detect "has this input changed?" by comparing content hashes.
### Partition Keys and Sort Keys
Records are organized hierarchically:
- **Partition Key** (`Ident`): Logical grouping, typically represents record type or source file
- **Sort Key** (`String`): Hierarchical key within partition, enables range queries
Example from diagnostics:
```rust
// Partition key groups all diagnostics together
const DIAGNOSTICS_PK: Ident = Ident::new("diagnostics");
// Sort key enables querying by file, severity, sequence
let sort_key = format!("{}|{:01}|{:04}", source_key, severity, sequence);
writer.insert(DIAGNOSTICS_PK, sort_key, diagnostic);
```
This structure enables efficient queries like "all diagnostics for file X" or "all errors (severity=2) across all files".
### Chunks and Dependencies
A `Chunk` represents the immutable result of a computation (`chunk.rs:23-35`):
```rust
pub struct Chunk<S: RecordStorage> {
id: ChunkId, // Content hash
task_id: Ident, // Stable task identifier
dependencies: Vec<ChunkId>, // Chunks read during computation
index: /* ... */, // Records organized by partition/sort key
// ...
}
```
The `dependencies` field forms a directed acyclic graph (DAG):
- **Source file chunk** → **Parse chunk** → **Symbol resolution chunk** → **Type check chunk**
This DAG enables incremental recompilation: if a source file changes (new content hash), the system knows to rebuild all dependent chunks.
### Snapshot Isolation (Lamport Clocks)
Every chunk receives a monotonically increasing timestamp when added to the database (`mod.rs:165`):
```rust
let commit_time = self.current_timestamp.fetch_add(1, Ordering::SeqCst);
```
When you create a `QueryClient`, it captures the current timestamp (`query/client.rs:67-74`). Queries only see chunks with `commit_time <= snapshot_time`, providing a consistent view even as new chunks are added concurrently.
### Fragmentation as a Feature
Multiple chunks can contain records for the same partition key. This is intentional (`mod.rs:63-69`):
**Why this is good:**
- Enables parallel query processing across chunks
- Natural result of incremental compilation (many tasks produce diagnostics)
- No defragmentation overhead
**How performance is maintained:**
- Adaptive parallelization automatically distributes work
- For small chunk counts, sequential processing is faster (avoids overhead)
- For large chunk counts, parallel processing dominates
## Architecture
### Core Types
#### Database<S: RecordStorage> (`mod.rs:74-108`)
The central data structure, cheaply clonable via Arc:
```rust
pub struct Database<S: RecordStorage> {
chunks: Arc<DashMap<ChunkId, Arc<Chunk<S>>>>, // All chunks by content hash
primary_index: Arc<DashMap<Ident, Vec<ChunkId>>>, // Partition key → chunks
content_index: Arc<DashMap<ContentHash, (Ident, SortKey)>>,// Hash → location
entry_chunks: Arc<DashMap<Ident, ChunkId>>, // GC roots
query_perf_decisions: Arc<DashMap<QueryKey, QueryModeDecision>>, // Adaptive perf
current_timestamp: Arc<AtomicU64>, // Lamport clock
}
```
**Cloning is cheap** (`mod.rs:110-121`): All fields are Arc-wrapped, so `Database::clone()` just increments reference counts. Pass clones to async tasks freely.
#### Chunk<S: RecordStorage> (`chunk.rs:23-35`)
Immutable computation result:
```rust
pub struct Chunk<S: RecordStorage> {
id: ChunkId, // Content hash
task_id: Ident, // Stable task identifier
parent_task_id: Option<Ident>, // Task that spawned this
commit_time: u64, // Lamport timestamp
record_count: usize, // Total records
index: IdentHashMap<BTreeMap<SortKey, S::Index>>, // Partition → sorted records
storage: S, // Actual record data
dependencies: Vec<ChunkId>, // Chunks read during computation
}
```
Records are stored in `BTreeMap` for efficient range queries (O(log n) lookup, ordered iteration).
#### RecordStorage Trait (`storage.rs:51-138`)
Abstraction for different storage backends:
```rust
pub trait RecordStorage: Send + Sync + 'static {
type Index: /* ... */; // Opaque handle to stored records
type RecordRef<'a>: /* ... */; // Borrowed view of a record
type Builder: RecordStorageBuilder<Storage = Self>;
fn get(&self, index: &Self::Index) -> Self::RecordRef<'_>;
fn content_hash(&self) -> ContentHash;
// ...
}
```
This allows pluggable implementations:
- In-memory with Vec (see `TestStorage` in `tests/storage.rs:79-139`)
- Compressed storage for large datasets
- Persistent storage backends
- Distributed storage
#### QueryClient (`query/client.rs:49-56`)
Snapshot-isolated query interface with dependency tracking:
```rust
pub struct QueryClient<'a, S: RecordStorage> {
db: &'a Database<S>,
snapshot_time: u64, // Snapshot timestamp
accessed_chunks: RefCell<HashSet<ChunkId>>, // Dependency tracking
pending_deps: RefCell<HashSet<Ident>>, // Missing partition keys
}
```
The `accessed_chunks` field is automatically populated during queries, building the dependency DAG for incremental recompilation.
#### RecordWriter (`chunk.rs:134-204`)
Builder for creating chunks:
```rust
pub struct RecordWriter<S: RecordStorage> {
task_id: Ident, // Stable task identifier
parent_task_id: Option<Ident>, // Parent task
dependencies: Vec<ChunkId>, // Input chunks
storage_builder: S::Builder, // Building storage
index: /* ... */, // Building index
}
```
Call `writer.build()` to finalize into an immutable `Chunk` (`chunk.rs:179-204`).
### Index Structure
The database maintains multiple indexes for different access patterns:
1. **Primary Index** (`primary_index`): Maps partition key → list of chunk IDs containing records for that partition. Used by all queries.
2. **Content Index** (`content_index`): Maps content hash → (partition key, sort key). Used to look up records by their content hash.
3. **Entry Chunks** (`entry_chunks`): Maps source identifier → chunk ID for source files (GC roots). These chunks have no `parent_task_id`.
4. **Query Performance Index** (`query_perf_decisions`): Maps query pattern → performance decision (sequential vs parallel). Populated by adaptive parallelization.
### Thread Safety Model
All data structures use Arc + DashMap:
- **Arc**: Cheap cloning, shared ownership across threads
- **DashMap**: Lock-free concurrent hash map with internal sharding
This design enables:
- Concurrent reads without blocking
- Concurrent writes to different partitions without contention
- Snapshot isolation via Lamport clocks (readers don't block writers)
## Writing Data
### RecordWriter API
The typical write workflow (`chunk.rs:134-163`):
```rust
// 1. Create writer with stable task_id
let mut writer = RecordWriter::new(task_id, dependencies);
// 2. Insert records
writer.insert(partition_key, sort_key, record_data);
writer.insert(partition_key, another_key, more_data);
// 3. Build chunk (finalizes storage, computes content hash)
let chunk = writer.build();
// 4. Add to database (assigns timestamp, updates indexes)
db.add_chunk(chunk)?;
```
In practice, the task system handles steps 3-4 automatically. You just create the writer, insert records, and return it from your task (see `scheduler/task/mod.rs:95-166`).
### Stable Task IDs
**Critical**: Task IDs must be stable across runs for incremental compilation to work.
From `chunk.rs:70-128`:
```rust
// ✓ GOOD - stable, includes input context
let task_id = Ident::new(&format!("parse:{}", uri));
let task_id = Ident::new(&format!("resolve:{}:{}", module, symbol));
// ✗ BAD - not stable
let task_id = Ident::new("parse"); // No input context
let task_id = Ident::new(&format!("parse:{}", timestamp)); // Non-deterministic
```
**Must include**: computation type, input identifiers, parameters
**Must NOT include**: timestamps, random values, version numbers, output data
### Building Chunks
When you call `writer.build()` (`chunk.rs:179-204`):
1. Storage builder is finalized
2. Content hash is computed from: task_id + storage content hash + dependencies
3. Immutable `Chunk` is created with the computed `ChunkId`
The content hash means: same inputs + same computation + same dependencies = same ChunkId.
### Integration with Scheduler
The scheduler automatically integrates with the database (`scheduler/task/mod.rs:95-166`):
```rust
// Your task returns a RecordWriter
writer.insert(pk, sk, data);
Some(writer) // Scheduler builds chunk and adds to DB
}, DEFAULT_LANE);
```
The scheduler:
1. Polls the async task
2. Receives `RecordWriter` result
3. Calls `writer.build()` to create chunk
4. Calls `db.add_chunk(chunk)` to add to database
5. Triggers watchers for affected partition keys
## Reading Data
### QueryClient and Snapshot Isolation
Create a query client from `TaskContext` (`scheduler/task/task_context.rs:91-93`):
```rust
let query_client = ctx.query_client();
```
The client captures the current timestamp (`query/client.rs:67-74`), providing a consistent snapshot:
```rust
let snapshot_time = db.current_timestamp.load(Ordering::SeqCst);
```
All queries filter chunks by `chunk.commit_time <= snapshot_time` (`query/client.rs:77-83`).
### Query API
The fluent `QueryBuilder` API (`query/mod.rs:140-299`):
```rust
// Query all records in a partition
let all_diagnostics = query_client
.query(DIAGNOSTICS_PK)
.execute()
.await;
// Exact match
let record = query_client
.query(SYMBOLS_PK)
.sort_key("main.rs|function|main")
.execute()
.await;
// Prefix search (hierarchical queries)
let file_diagnostics = query_client
.query(DIAGNOSTICS_PK)
.sort_key_begins_with("file:///src/main.rs|")
.execute()
.await;
// Range queries
let errors_and_warnings = query_client
.query(DIAGNOSTICS_PK)
.sort_key_between("001", "003") // severity range
.execute()
.await;
let low_severity = query_client
.query(DIAGNOSTICS_PK)
.sort_key_less_than("002")
.execute()
.await;
let high_severity = query_client
.query(DIAGNOSTICS_PK)
.sort_key_greater_than("002")
.execute()
.await;
```
### QueryResults Iteration
Results implement `QueryResults` (`query_results.rs:24-67`):
```rust
let results = query_client.query(pk).execute().await;
// Check if empty
if results.is_empty() {
return Ok(());
}
// Get count
let count = results.len();
// Iterate over (metadata, record_ref) pairs
for (metadata, record_ref) in results.iter() {
// metadata: &RecordMetadata - partition_key, sort_key, chunk_id
// record_ref: S::RecordRef<'_> - actual record data
process_record(record_ref)?;
}
// Access by index
if let Some(record) = results.get(&results.records()[0]) {
// ...
}
```
### Dependency Tracking
Every query automatically tracks accessed chunks (`query/client.rs:365-367`):
```rust
// Record this chunk as a dependency
self.accessed_chunks.borrow_mut().insert(*chunk_id);
```
After your task completes, call `query_client.dependencies()` to get the list of chunks read. The scheduler uses this to build the dependency DAG.
If a query doesn't find a partition key, it records a pending dependency (`query/client.rs:370-391`):
```rust
self.pending_deps.borrow_mut().insert(partition_key);
```
This allows the system to track "task X depends on partition Y, but Y doesn't exist yet" relationships.
### Batch Operations
Query multiple records efficiently (`query/client.rs:115-136`):
```rust
let items = vec![
(SYMBOLS_PK, "main.rs|function|main".to_string()),
(SYMBOLS_PK, "lib.rs|function|init".to_string()),
(TYPES_PK, "main.rs|struct|Config".to_string()),
];
let results = query_client.batch_get_items(items).await;
```
## Advanced Topics
### Adaptive Parallelization
The system automatically learns whether sequential or parallel execution is faster for each query pattern.
**How it works** (`query/mod.rs:27-59`, `query/client.rs:912-989`):
1. Queries are bucketed by `(partition_key, query_type, chunk_count_bucket)`
2. Chunk count < 10: Always sequential (overhead not worth it)
3. Chunk count > 10,000: Always parallel (obvious win)
4. In between: Measure both approaches
**Measurement cycle** (`query/client.rs:912-943`):
1. Run query sequentially, record execution time
2. Run query in parallel, record execution time
3. Compare and lock in the faster mode for this query pattern
4. Store decision in `query_perf_decisions` index
**Chunk count bucketing** (`query/mod.rs:71-98`):
```rust
// Buckets: 0-10, 11-100, 101-1000, 1001-10000, 10001+
let bucket = match chunk_count {
0..=10 => ChunkCountBucket::Small,
11..=100 => ChunkCountBucket::Medium,
// ...
};
```
This prevents re-measuring when chunk count changes slightly.
**Why this matters**: Some queries benefit from parallelization (many chunks, CPU-bound filtering), while others are faster sequentially (few chunks, overhead dominates). The system adapts to actual workload characteristics.
### Garbage Collection
Mark-and-sweep GC from entry chunks (`mod.rs:228-303`):
**Entry chunks** are GC roots - chunks with no `parent_task_id`, typically representing source files (`mod.rs:172-175`):
```rust
if chunk.parent_task_id().is_none() {
self.entry_chunks.insert(task_id, chunk_id);
}
```
**Mark phase** (`mod.rs:243-263`): Walk dependency graph from entry chunks, marking reachable chunks.
**Sweep phase** (`mod.rs:278-303`): Remove unmarked chunks from all indexes:
- Remove from `chunks` map
- Remove from `primary_index` (if no longer referenced)
- Remove from `content_index`
- Remove from `entry_chunks` (if no longer an entry)
This ensures referential integrity - you can't have dangling references to non-existent chunks.
### Custom Storage Backends
Implement `RecordStorage` and `RecordStorageBuilder` traits (`storage.rs:51-138`):
```rust
pub struct MyStorage {
records: Vec<MyRecordType>,
}
impl RecordStorage for MyStorage {
type Index = usize; // Vec index
type RecordRef<'a> = &'a MyRecordType;
type Builder = MyStorageBuilder;
fn get(&self, index: &Self::Index) -> Self::RecordRef<'_> {
&self.records[*index]
}
fn content_hash(&self) -> ContentHash {
// Hash all records
let mut hasher = ContentHasher::new();
for record in &self.records {
record.hash(&mut hasher);
}
hasher.finish()
}
}
pub struct MyStorageBuilder {
records: Vec<MyRecordType>,
}
impl RecordStorageBuilder for MyStorageBuilder {
type Storage = MyStorage;
fn insert(&mut self, record: impl LaburnumRecord) -> usize {
self.records.push(record.downcast());
self.records.len() - 1
}
fn finalize(self) -> MyStorage {
MyStorage { records: self.records }
}
}
```
See `tests/storage.rs:79-139` for a complete example.
### Performance Characteristics
- **Query parallelization**: Automatic work distribution across CPU cores for large chunk counts
- **Lock-free concurrent access**: DashMap with internal sharding, no global locks
- **Cheap cloning**: Arc-based database clones (reference count increment only)
- **Content deduplication**: Identical chunks share storage via content hashing
- **Adaptive thresholds**: System learns optimal execution modes per query pattern
- **BTreeMap indexes**: O(log n) range queries within chunks, ordered iteration
## Common Patterns
### Pattern 1: Parse and Write Results
Parsing a file and writing the results:
```rust
async fn on_file_version(
uri: Uri,
source: Arc<Source>,
source_key: SourceKey,
_ctx: TaskContext<MyStorage, MyServer>,
mut writer: RecordWriter<MyStorage>,
) -> Result<RecordWriter<MyStorage>, LaburnumError> {
// Parse the source (reify_content converts rope/string to owned String)
let content = source.reify_content().ok_or(LaburnumError::FileEvicted)?;
let (tokens, lex_errors, lex_state) = lex(source_key, &content);
let (ast, parse_errors, parse_state) = parse(lex_state, &tokens);
// Write structured data
build_symbol_table(&uri, source_key, ast, &source, &mut writer).await?;
// Write diagnostics
for (sequence, error) in lex_errors.iter().enumerate() {
let diagnostic = error_to_diagnostic(error, source_key);
let severity = diagnostic.severity()?;
let sort_key = format!("{}|{:01}|{:04}", source_key, severity, sequence);
writer.insert(DIAGNOSTICS_PK, sort_key, diagnostic);
}
// Return writer - scheduler builds chunk and adds to DB
Ok(writer)
}
```
### Pattern 2: Query and React to Changes (Watchers)
Defining a watcher that reacts to partition changes:
```rust
// Define watcher for partition key
laburnum::watchers! {
(Server: MyServer, Storage: MyStorage),
SYMBOL_TABLE_PK => handle_symbol_table_changes,
}
fn handle_symbol_table_changes<'a>(
ctx: &'a mut TaskContext<MyStorage, MyServer>,
writer: &'a mut RecordWriter<MyStorage>,
) -> Pin<Box<dyn Future<Output = ()> + Send + 'a>> {
Box::pin(async move {
let query_client = ctx.query_client();
// Query all symbols
let all_symbols = query_client
.query(SYMBOL_TABLE_PK)
.execute()
.await;
// Build lookup map
let mut symbol_map = HashMap::new();
for (_metadata, record_ref) in all_symbols.iter() {
if let MyRecordRef::Symbol(sym) = record_ref {
symbol_map.insert(sym.name.clone(), Arc::clone(sym));
}
}
// Get changed keys from watcher context
let updated = ctx.matched_keys_updated().to_vec();
// Process each changed symbol
for key in updated.iter() {
let query_results = key.get_record(query_client).await;
// Resolve references, update derived data, etc.
for (_metadata, record_ref) in query_results.iter() {
process_symbol(record_ref, &symbol_map, writer)?;
}
}
})
}
```
### Pattern 3: Scheduler Integration
From `scheduler/mod.rs:238-244`:
```rust
// Queue a task that writes to the database
let result = compute_something().await?;
// Create writer with stable task_id
let task_id = Ident::new(&format!("compute:{}", input_id));
let mut writer = ctx.new_record_writer(task_id);
// Insert results
writer.insert(OUTPUT_PK, sort_key, result);
// Return writer - scheduler handles chunk building and DB insertion
Some(writer)
}, DEFAULT_LANE);
```
## Important Conventions
### Sort Key Formatting
From project CLAUDE.md:
**NO HEX FORMATTING** in sort keys - hex wastes memory with letters a-f. Use zero-padded decimal for proper lexicographic sorting:
```rust
// ❌ BAD - hex wastes memory
// ✅ GOOD - zero-padded decimal for lexicographic sorting
format!("{}|{:010}", source_key, line)
```
Zero-padding (`:010`) is required for numeric sort keys to ensure proper lexicographic ordering: `"0000000001"` < `"0000000010"` < `"0000000100"`.
### Built-in Partition Key Constants
Built-in Laburnum features export partition key constants:
```rust
use laburnum::diagnostics::DIAGNOSTICS_PK;
writer.insert(DIAGNOSTICS_PK, sort_key, diagnostic);
```
When adding new built-in features:
1. Define `const FEATURE_PK: Ident = Ident::new("feature_name")` in feature module
2. Export the constant publicly
3. Document sort key format in module-level documentation
4. Provide helper functions for sort key generation if complex
5. Use the constant consistently - never create the Ident inline
Example:
```rust
// In src/my_feature/mod.rs
pub const MY_FEATURE_PK: Ident = Ident::new("my_feature");
pub fn my_feature_sort_key(id: u64, timestamp: u64) -> String {
format!("{:016}|{:016}", id, timestamp) // Zero-padded decimal, not hex
}
```
### Stable Task Identifier Rules
From `chunk.rs:70-128`, task IDs must be:
**Stable across runs:**
- Same computation on same input must produce same task_id
- Enables incremental compilation cache hits
**Include in task_id:**
- Computation type (parse, resolve, typecheck, etc.)
- Input identifiers (file URI, symbol name, etc.)
- Parameters that affect output (configuration flags, etc.)
**Exclude from task_id:**
- Timestamps or wall clock time
- Random values or UUIDs
- Version numbers (unless they affect output)
- Output data or results
- Process IDs or thread IDs
**Examples:**
```rust
// ✓ GOOD
Ident::new(&format!("parse:{}", file_uri))
Ident::new(&format!("resolve:{}:{}", module_name, symbol_name))
Ident::new(&format!("typecheck:{}:strict={}", function_id, strict_mode))
// ✗ BAD
Ident::new("parse") // Missing input context
Ident::new(&format!("parse:{}:{}", uri, timestamp)) // Non-deterministic
Ident::new(&format!("resolve:{}", Uuid::new_v4())) // Random component
```
## Testing
### TestStorage Example
The test suite includes a complete `RecordStorage` implementation (`tests/storage.rs:79-139`):
```rust
pub enum TestRecordData {
Module { exports: Vec<Ident> },
Function { name: Ident, body: String },
Struct { name: Ident, fields: Vec<Ident> },
Laburnum(LaburnumRecord),
}
pub struct TestStorage {
modules: Vec<TestRecordData>, // Separate Vecs for each variant
functions: Vec<TestRecordData>,
structs: Vec<TestRecordData>,
laburnum: Vec<LaburnumRecord>,
}
pub enum TestIndex {
Module(usize),
Function(usize),
Struct(usize),
Laburnum(usize),
}
impl RecordStorage for TestStorage {
type Index = TestIndex;
type RecordRef<'a> = TestRecordRef<'a>;
type Builder = TestStorageBuilder;
fn get(&self, index: &Self::Index) -> Self::RecordRef<'_> {
match index {
TestIndex::Module(i) => TestRecordRef::Module(&self.modules[*i]),
TestIndex::Function(i) => TestRecordRef::Function(&self.functions[*i]),
// ...
}
}
// ...
}
```
This demonstrates:
- Enum-based record types with pattern matching
- Indexed storage via separate Vecs per variant
- Content hashing implementation
- `LaburnumRecordRef` downcasting support
### Test Usage Patterns
From `tests/generic_database.rs:34-95`:
```rust
#[test]
async fn test_basic_write_and_read() {
// Create scheduler (contains database)
let (scheduler, _conn) = test_scheduler();
// Write task
scheduler.queue(move |_ctx| async move {
let task_id = Ident::new(&format!("module:{}", module_name));
let mut writer = RecordWriter::new(task_id, vec![]);
writer.insert(
task_id,
module_name.clone(),
TestRecordData::Module { exports: vec![/* ... */] },
);
Some(writer)
}, DEFAULT_LANE);
scheduler.spawn_workers();
// Read task
scheduler.queue(move |mut ctx| async move {
let query_client = ctx.query_client();
let results = query_client
.get_record(module_id, module_name.clone())
.await;
assert!(!results.is_empty());
for (_metadata, record_ref) in results.iter() {
match record_ref {
TestRecordRef::Module(data) => {
// Verify data
}
_ => panic!("Expected module record"),
}
}
None
}, DEFAULT_LANE);
}
```
## File Structure
Quick reference of module contents:
| `mod.rs` | Core `Database<S>` struct, `add_chunk()`, garbage collection, chunk visibility filtering |
| `chunk.rs` | `Chunk<S>`, `ChunkId`, `RecordWriter<S>` types and builders |
| `storage.rs` | `RecordStorage` and `RecordStorageBuilder` traits for pluggable storage backends |
| `query_results.rs` | `QueryResults<S>` container for query output, iteration support |
| `stats.rs` | Statistics collection (`DbStats`, `TaskStats`) for monitoring |
| `query/mod.rs` | Query enums, `QueryBuilder` fluent API, adaptive parallelization types |
| `query/client.rs` | `QueryClient<S>` with parallel/sequential execution, snapshot isolation, dependency tracking |
| `tests/mod.rs` | Test module exports and test helpers |
| `tests/storage.rs` | `TestStorage` implementation example |
| `tests/generic_database.rs` | Basic database operation tests (write, read, query, GC) |
| `tests/compilation_dag.rs` | Dependency graph and incremental compilation tests |
## Summary
This database provides a robust foundation for incremental compilation systems and language servers. Key takeaways:
1. **Content addressing** enables automatic cache invalidation - when inputs change, content hashes change, creating new chunks rather than mutating old ones.
2. **Snapshot isolation** via Lamport clocks provides consistent reads during concurrent writes without blocking.
3. **Dependency tracking** automatically builds a DAG from source files through compilation stages, enabling precise incremental recompilation.
4. **Adaptive parallelization** measures actual performance and chooses the optimal execution mode for each query pattern.
5. **Pluggable storage** via the `RecordStorage` trait allows customization for different workload characteristics.
The architecture embraces immutability, uses content hashing for cache invalidation, and provides both high-level ergonomics (fluent query API, scheduler integration) and low-level control (custom storage implementations).