# CLAUDE.md
This file provides guidance to Claude Code (claude.ai/code) when working with code in this repository.
## Contract-First Design
This project follows contract-first development with provable-contracts.
Contracts live in `../provable-contracts/contracts/trueno-db/`.
Run `pmat comply check` to validate contract compliance.
## Project Overview
**Trueno-DB** is a GPU-first, SIMD-fallback embedded analytics database built on Apache Arrow and Trueno. It provides high-performance aggregations with graceful degradation from GPU → AVX-512 → AVX2 → SSE2 → Scalar, including WASM/WebGPU support for browser deployment.
**Design Philosophy:**
- GPU-first compute with automatic fallback to SIMD
- Zero-copy operations via Arrow columnar format
- WASM-portable for browser analytics
- Multi-GPU scaling capability
## Code Search (pmat query)
**NEVER use grep or rg for code discovery.** Use `pmat query` instead -- it returns quality-annotated, ranked results with TDG scores and fault annotations.
```bash
# Find functions by intent
pmat query "query execution" --limit 10
# Find high-quality code
pmat query "sql parser" --min-grade A --exclude-tests
# Find with fault annotations (unwrap, panic, unsafe, etc.)
pmat query "index scan" --faults
# Filter by complexity
pmat query "join algorithm" --max-complexity 10
# Cross-project search
pmat query "gpu kernel" --include-project ../trueno
# Git history search (find code by commit intent via RRF fusion)
pmat query "fix query planner" -G
pmat query "fix query planner" --git-history
# Enrichment flags (combine freely)
pmat query "executor" --churn # git volatility (commit count, churn score)
pmat query "scan operator" --duplicates # code clone detection (MinHash+LSH)
pmat query "aggregate" --entropy # pattern diversity (repetitive vs unique)
pmat query "query engine" --churn --duplicates --entropy --faults -G # full audit
```
## Architecture
### Core Components
**Backend Selection Hierarchy:**
1. Multi-GPU distribution (if multiple GPUs available and workload supports it)
2. Single GPU via wgpu (for datasets >100K rows)
3. SIMD via Trueno (AVX-512 → AVX2 → Auto-detect)
4. Scalar fallback
**Storage Layer:**
- Apache Arrow columnar format (Int32Array, Float32Array, StringArray, etc.)
- Zero-copy GPU transfers (Arrow buffers → GPU VRAM)
- Parquet and CSV readers
**Query Engine:**
- SQL-like subset (SELECT, WHERE, GROUP BY, aggregations, JOINs, window functions)
- Cost-based backend dispatcher
- LRU cache for hot queries
**Compute Backends:**
- GpuBackend: wgpu compute shaders (Vulkan/Metal/DX12/WebGPU)
- TruenoBackend: SIMD acceleration via trueno crate
- WASM: WebGPU + SIMD128 for browser deployment
## Development Commands
### Project Setup
```bash
make build # Build the project
make build-release # Release build
make test # Run all tests
make lint # Lint with clippy (zero tolerance)
make check # Lint + test
```
### Quality Gates (EXTREME TDD)
```bash
# All of these must pass before any commit:
make test # 100% tests pass
make lint # Zero clippy warnings
make coverage # >90% code coverage required
make tdg # TDG score ≥B+ (85/100)
make mutants # ≥80% mutation kill rate (optional)
make quality-gate # Run all gates (lint + test + coverage)
```
**IMPORTANT**: Always use `make coverage` instead of raw `cargo llvm-cov` commands.
The Makefile handles mold linker conflicts automatically.
### WASM Build
```bash
cargo build --target wasm32-unknown-unknown --release
```
### Benchmarking
```bash
# Benchmarks are required for all performance claims
cargo bench # Run benchmarks
cargo bench --bench aggregations # Specific benchmark suite
```
## Critical Dependencies
### trueno Integration
```toml
[dependencies]
trueno = "0.3.0" # SIMD fallback (ALWAYS use crates.io version)
wgpu = "22" # GPU compute
arrow = "53" # Columnar format
parquet = "53" # Parquet reader
sqlparser = "0.52" # SQL parsing
tokio = { version = "1", features = ["full"] }
rayon = "1.8" # CPU parallelism
```
**IMPORTANT:** Always use the latest trueno from crates.io, never git dependencies.
Check for updates:
```bash
cargo search trueno # Check latest version
```
## Testing Requirements
### Backend Equivalence Testing
GPU results MUST exactly match SIMD and scalar results for correctness:
```rust
#[test]
fn test_sum_backend_equivalence() {
let data = generate_test_data(1_000_000);
let gpu_result = execute_on_backend(Backend::Gpu, &data);
let simd_result = execute_on_backend(Backend::Trueno(Auto), &data);
let scalar_result = execute_on_backend(Backend::Scalar, &data);
assert_eq!(gpu_result, simd_result);
assert_eq!(simd_result, scalar_result);
}
```
### Property-Based Testing
Use property tests for operations:
```rust
#[quickcheck]
fn prop_sum_commutative(xs: Vec<f32>) -> bool {
let result1 = db.query("SELECT sum(x) FROM data").execute();
let shuffled = shuffle(xs);
let result2 = db.query("SELECT sum(x) FROM shuffled").execute();
result1 == result2
}
```
### Performance Regression Tests
Every benchmark must prove claimed speedups:
```rust
// Target: 50-100x faster than CPU for 100M row SUM
assert!(gpu_time < cpu_time / 50.0);
```
## GPU Kernel Development
GPU operations are implemented in WGSL (WebGPU Shading Language):
```wgsl
// Example: Parallel sum reduction
@group(0) @binding(0) var<storage, read> input: array<f32>;
@group(0) @binding(1) var<storage, read_write> output: atomic<u32>;
@compute @workgroup_size(256)
fn sum_kernel(@builtin(global_invocation_id) id: vec3<u32>) {
let idx = id.x;
if (idx < arrayLength(&input)) {
atomicAdd(&output, bitcast<u32>(input[idx]));
}
}
```
### Performance Targets
| SUM | 50-100x | Must meet |
| AVG | 50-100x | Must meet |
| COUNT | 30-50x | Must meet |
| MIN/MAX | 50-100x | Must meet |
| FILTER | 20-40x | Must meet |
## Code Quality Standards
### Pre-commit Requirements
- All tests pass (cargo test --all-features)
- Zero clippy warnings (cargo clippy -- -D warnings)
- Code coverage >90%
- TDG score ≥B+ (85/100)
- Mutation testing ≥80% kill rate
### Pull Request Requirements
- Include benchmarks proving performance claims
- Add property-based tests for correctness
- Backend equivalence tests (GPU == SIMD == Scalar)
- Update CHANGELOG.md (keep-a-changelog format)
### Release Requirements
- Repository score ≥90/110 (pmat repo-score)
- Performance regression tests vs previous version
- WASM build succeeds
- Multi-GPU tests pass (if GPUs available)
## Implementation Phases
### Phase 1: Core Engine (Current)
- Arrow storage backend (Parquet, CSV readers)
- SQL parser (SELECT, WHERE, GROUP BY, aggregations)
- GPU kernels (sum, avg, count, min, max)
- SIMD fallback via Trueno
- 100+ unit tests with property-based and equivalence testing
- Benchmarks vs DuckDB, SQLite, Polars
### Phase 2: Multi-GPU
- Local multi-GPU data partitioning (range/hash)
- Cost-based query planner
- Multi-GPU aggregation with reduce
- 2 GPU vs 1 GPU vs CPU benchmarks
### Phase 3: Distribution
- gRPC worker protocol
- Distributed query execution
- Fault tolerance (retry, failover)
- Remote multi-GPU benchmarks
### Phase 4: WASM
- wasm32-unknown-unknown build target
- WebGPU backend integration
- Browser example dashboard
- WebGPU vs SIMD128 browser benchmarks
## Example Usage Patterns
### Embedded Rust
```rust
use trueno_db::Database;
let db = Database::builder()
.backend(Backend::Gpu)
.fallback(Backend::Trueno(trueno::Backend::Auto))
.cache_size_mb(512)
.build()?;
db.load_table("events", "data/events.parquet").await?;
let result = db.query(
"SELECT category, sum(value) as total
FROM events
WHERE timestamp > '2025-11-01'
GROUP BY category
ORDER BY total DESC"
).execute().await?;
```
### WASM/Browser
```javascript
const db = new Database({
backend: 'webgpu',
fallback: 'simd128'
});
await db.loadTable('events', '/data/events.parquet');
const result = await db.query('SELECT category, sum(value) FROM events GROUP BY category');
```
## Dogfooding Opportunities
This database is designed to be used in:
- **assetsearch**: GPU-accelerated aggregations replacing PostgreSQL
- **assetgen**: Real-time AI model metadata dashboards
- **bashrs**: Command history analytics
- **auth-billing**: Real-time usage tracking
- **interactive.paiml.com**: WASM analytics dashboards
## Academic Foundations
Key papers informing the architecture:
- MonetDB/X100: Vectorized execution (CIDR 2005)
- HeavyDB: GPU database patterns (SIGMOD 2017)
- Apache Arrow: Columnar format (VLDB 2020)
- Volcano Optimizer: Cost-based optimization (IEEE 1993)
- Morsel-Driven Parallelism: NUMA-aware execution (SIGMOD 2014)
See docs/specifications/db-spec-v1.md section 8 for complete references.
## Backend Story Policy (CRITICAL - NEVER VIOLATE)
### Zero Tolerance Backend Requirements
**ALL operations in trueno-db MUST work on ALL backends:**
| **Scalar** | Reference implementation | Fallback, testing |
| **SIMD** | trueno Vector ops (SSE2/AVX/AVX2/AVX512/NEON) | Default CPU path |
| **GPU** | wgpu compute shaders (Vulkan/Metal/DX12/WebGPU) | Large datasets (>10MB) |
| **WASM** | SIMD128 + WebGPU | Browser deployment |
### Adding New Operations - Step by Step
When adding ANY new operation to trueno-db:
1. **Implement Scalar version first** (reference implementation)
2. **Implement SIMD version** using `trueno::Vector` ops
3. **Implement GPU version** in `src/gpu/` with WGSL shader
4. **Add equivalence test** to `tests/backend_story.rs`
5. **Update BackendDispatcher** FLOP estimation if needed
6. **Verify** with `cargo test --test backend_story`
### Enforcement Mechanisms
1. **Pre-commit hook**: Runs `cargo test --test backend_story` before every commit
2. **CI pipeline**: Blocks PRs that break backend story tests
3. **CLAUDE.md**: This policy is read by Claude Code for enforcement
4. **Code review**: Backend equivalence is mandatory review criteria
### Common Violations to Avoid
```rust
// BAD: GPU-only implementation
pub async fn new_aggregation(&self) -> Result<f32> {
self.gpu_engine.compute() // NO! What about SIMD/Scalar?
}
// GOOD: All backends supported
pub fn new_aggregation_scalar(data: &[f32]) -> f32 { ... }
pub fn new_aggregation_simd(data: &[f32]) -> f32 { Vector::from_slice(data).operation()... }
pub async fn new_aggregation_gpu(&self, data: &Arrow) -> Result<f32> { ... }
```
### Backend Story Tests
Run these tests before ANY commit:
```bash
# CPU backends (always runs)
cargo test --test backend_story
# With GPU backends (requires hardware)
cargo test --test backend_story --features gpu
```
## Toyota Way Principles
### Jidoka (Built-in Quality)
- EXTREME TDD: Tests before implementation
- Pre-commit hooks enforce quality gates
- Backend equivalence: GPU == SIMD == Scalar
### Kaizen (Continuous Improvement)
- Benchmarks required for all optimizations
- Performance regression tests detect slowdowns
- Mutation testing finds coverage gaps
### Respect for People
- Simple DuckDB-like API
- Graceful degradation (works on any hardware)
- Clear error messages explaining GPU failures and fallbacks
## Stack Documentation Search (RAG Oracle)
**IMPORTANT: Proactively use the batuta RAG oracle when:**
- Looking up patterns from other stack components
- Finding cross-language equivalents (Python HuggingFace → Rust)
- Understanding how similar problems are solved elsewhere in the stack
```bash
# Search across the entire Sovereign AI Stack
batuta oracle --rag "your question here"
# Reindex after changes (auto-runs via post-commit hook + ora-fresh)
batuta oracle --rag-index
# Check index freshness (runs automatically on shell login)
ora-fresh
```
The RAG index covers 5000+ documents across the Sovereign AI Stack.
Index auto-updates via post-commit hooks and `ora-fresh` on shell login.
To manually check freshness: `ora-fresh`
To force full reindex: `batuta oracle --rag-index --force`