worktable 0.8.21

WorkTable is in-memory storage
Documentation
# CLAUDE.md

This file provides guidance to Claude Code (claude.ai/code) when working with code in this repository.

## Project Overview

WorkTable is an in-memory storage system with on-disk persistence capabilities, written in Rust. It uses a declarative macro (`worktable!`) to generate type-safe table structures, indexes, and queries at compile time.

## Build, Test, and Development Commands

### Building

```bash
# Build all workspace members
cargo build

# Build with performance measurements feature
cargo build --features perf_measurements

# Build specific workspace member
cargo build -p worktable_codegen
```

### Testing

```bash
# Run all tests
cargo test

# Run specific test module
cargo test worktable::base

# Run tests in a specific workspace member
cargo test -p worktable

# Run a single test file
cargo test --test mod tests::worktable::base

# Run async tests (many use tokio)
cargo test --lib

# Run benchmarks (uses performance measurement)
cargo test bench --features perf_measurements
```

### Code Quality

```bash
# Check code without building
cargo check

# Run clippy for linting
cargo clippy

# Format code
cargo fmt

# Build documentation
cargo doc --open
```

## Architecture

### Macro-Generated Code Architecture

The `worktable!` macro generates complete table implementations:

- `<Name>WorkTable` - Main table struct wrapping the core `WorkTable`
- `<Name>Row` - Row struct with defined columns
- `<Name>PrimaryKey` - Type-safe primary key wrapper (newtype around the PK type)
- `<Name>Wrapper` - Wrapper for serialization (with `GhostWrapper` for deferred deserialization)
- `<Name>Lock` - Lock types for row-level concurrency control
- Custom query methods based on declared queries and indexes

### Internal Storage Architecture

**DataPages** (`src/in_memory/pages.rs`):
- Manages data as fixed-size pages (configurable, default 4KB)
- Lock-free insertions using atomic operations
- Empty space tracking with `empty_links` stack for efficient space reuse
- Link-based navigation: `Link { page_id, offset, length }`

**Indexing** (`src/index/`):
- `PrimaryIndex` - BTreeMap from primary key to `Link` (using indexset crate)
- `SecondaryIndexes` - Additional indexes for efficient non-PK searches
- Uses `IndexMap` and `IndexMultiMap` from the indexset crate (concurrent B-trees)

**Locking** (`src/lock/`):
- `LockMap` - Row-level locking for concurrent operations
- Smart locking for partial row updates (non-overlapping fields can be updated simultaneously)
- `WorkTableLock` - Coordinates table-wide locks including vacuum page locks

**Persistence** (`src/persistence/`):
- Change Data Capture (CDC) for tracking modifications
- `PersistenceEngine` - Async persistence with operation queues
- File extensions: `.wt.data` for data, `.wt.idx` for indexes
- Operations: `Insert`, `Update`, `Delete`, with CDC events

### Key Data Flow

1. **Insert**: Row serialized via `rkyv` → stored in DataPages → returns Link → indexes updated
2. **Select**: Lookup Link in primary index → fetch from DataPages → deserialize
3. **Update**: Acquire row lock → reinsert row (creates new Link) → update all indexes → delete old data

### Generated Type Naming Conventions

Defined in `codegen/src/name_generator.rs`:
- Row: `<name>Row`
- Primary Key: `<name>PrimaryKey`
- Table: `<name>WorkTable`
- Wrapper: `<name>Wrapper`
- Lock: `<name>Lock`
- Available types enum: `<name>AvailableTypes`
- Indexes tuple: `<name>AvailableIndexes`

## Code Organization

### Runtime Code (`src/`)

- `lib.rs` - Module exports and prelude
- `table/mod.rs` - Core `WorkTable` struct with CRUD operations
- `in_memory/` - `DataPages`, `Data` (page storage), `StorableRow` trait
- `index/` - Primary and secondary index implementations
- `lock/` - Concurrency control (row-level locks, page locks for vacuum)
- `persistence/` - CDC, persistence engine, space management
- `table/vacuum/` - Vacuum implementation for defragmentation
- `table/select/` - Query builder for `select_all()` with filtering/sorting

### Code Generation (`codegen/`)

- `worktable/` - Main macro implementation
  - `model/` - Parsed AST from macro input
  - `generator/` - Code generation for tables, indexes, queries
  - `parser/` - Token parsing for columns, indexes, queries
- `name_generator.rs` - Naming conventions for generated types
- `persist_table.rs` - Persistence-related code generation
- `persist_index.rs` - Index persistence code generation

### Tests (`tests/`)

- `worktable/base.rs` - Basic CRUD operations
- `worktable/in_place.rs` - `update_in_place` query tests
- `worktable/unsized_.rs` - Variable-size types (String)
- `worktable/index/` - Index-related tests
- `worktable/vacuum.rs` - Vacuum/defragmentation tests
- `persistence/` - Persistence functionality tests

## Working with the Codebase

### Adding a New Feature

1. **Runtime changes** - Update `src/` as needed
2. **Macro changes** - Update `codegen/src/worktable/`
3. **Generated types** - Follow naming conventions in `name_generator.rs`
4. **Tests** - Add tests in `tests/worktable/` covering the feature

### Modifying the Macro

The `worktable!` macro structure:
```rust
worktable!(
    name: TableName,
    columns: {
        col_name: Type primary_key autoincrement,
        other_col: Type optional,
    },
    indexes: {
        idx_name: column unique,  // unique is optional
    },
    queries: {
        update: { FieldById(field) by id },
        delete: { ByField() by field },
        in_place: { FieldMutById(field) by id },
    }
);
```

Test macro changes:
```bash
cargo test -p worktable_codegen  # Test macro compilation
cargo test worktable              # Test generated code
```

### Column Types and Flags

- `primary_key` - Marks as primary key (can have multiple for composite PK)
- `autoincrement` - Auto-incrementing PK for integer types
- `custom` - Custom PK generation (provide your own `PrimaryKeyGenerator`)
- `optional` - Column is `Option<T>`

### Query Types

**`update` queries**: Partial row update with smart locking (non-overlapping fields)
**`delete` queries**: Delete by indexed field
**`in_place` queries**: Update field via closure without selecting row first (thread-safe)

### Current Development Focus

The codebase is actively developing **vacuum/defragmentation** functionality (see `docs/vacuum-implementation-plan.md`):
- Page consolidation to free empty pages
- Row migration with index updates
- Concurrent-safe vacuum operations
- CDC integration for persistence

### Dependencies

- `rkyv` - Zero-copy deserialization
- `indexset` - Concurrent B-tree maps/multimaps
- `data_bucket` - Local crate for page-based storage primitives
- `tokio` - Async runtime
- `parking_lot` - Efficient RwLock
- `convert_case` - Case conversion for code generation

### Important Constraints

- **Primary keys**: Cannot be updated after insert (returns `PrimaryUpdateTry` error)
- **Link updates**: When rows are reinserted (update/vacuum), all indexes must be updated atomically
- **GhostWrapper**: Rows are stored as "ghosts" initially, only validated after indexes are updated
- **Page locks**: During vacuum, pages are locked - concurrent operations must wait via `await_page_lock()`