batuta 0.7.2

Sovereign AI orchestration: autonomous agents, ML serving, code analysis, and transpilation pipelines
Documentation
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
# CLAUDE.md

This file provides guidance to Claude Code (claude.ai/code) when working with code in this repository.

## Project Overview

Batuta is the orchestration framework for the **Sovereign AI Stack** — a pure-Rust ecosystem for privacy-preserving ML infrastructure. It coordinates stack components (trueno, aprender, pacha, realizar) and provides transpilation pipelines for converting Python/C/Shell to Rust.

### Stack Architecture

```
┌─────────────────────────────────────────────────────────────┐
│                      batuta (Orchestration)                 │
├─────────────────────────────────────────────────────────────┤
│  whisper.apr (ASR)  │  realizar (Inference)  │ pacha (Reg)  │
├─────────────────────┴────────────────────────┴──────────────┤
│   aprender (ML)   │  entrenar (Training)  │ jugar (Games)   │
├───────────────────┴───────────────────────┴─────────────────┤
│   simular (Simulation)   │   profesor (Education)           │
├──────────────────────────┴──────────────────────────────────┤
│                 repartir (Distributed Compute)              │
│           CPU (Rayon) │ GPU (wgpu) │ Remote (TCP/TLS)       │
├─────────────────────────────────────────────────────────────┤
│  trueno-zram (Compression)  │  trueno-ublk (Block Device)   │
├─────────────────────────────┴───────────────────────────────┤
│               trueno (SIMD/GPU Compute Primitives)          │
│         AVX2/AVX-512/NEON │ wgpu │ LZ4/ZSTD compression     │
└─────────────────────────────────────────────────────────────┘
```

## Build and Development Commands

```bash
# Build
cargo build                    # Debug build
cargo build --release --locked # Release build

# Testing (uses nextest for parallelism)
make test-fast                 # Fast unit tests (<30s target)
make test                      # Standard tests (<2min target)
make test-full                 # All features enabled
cargo test --lib               # Unit tests only
cargo test --test '*'          # Integration tests only

# Single test
cargo test test_name           # Run specific test
cargo nextest run test_name    # With nextest

# Linting and formatting
make lint                      # Clippy with -D warnings
make fmt                       # Format code
make fmt-check                 # Check formatting

# Coverage (two-phase pattern, temporarily disables mold linker)
make coverage                  # HTML + LCOV reports in target/coverage/

# Quality tiers (Certeza Methodology)
make tier1                     # On-save (<1s): fmt, clippy, check
make tier2                     # Pre-commit (<5s): test --lib, clippy
make tier3                     # Pre-push (1-5min): full tests
make tier4                     # CI/CD: release tests + pmat analysis

# Mutation testing
make mutants-fast              # Quick sample (~5 min)
make mutants                   # Full suite (~30-60 min)
make mutants-file FILE=src/backend.rs  # Specific file

# WASM build
make wasm                      # Debug WASM
make wasm-release              # Optimized WASM

## Code Search (pmat query)

Use `pmat query` instead of grep for code discovery. Returns quality-annotated, ranked results.

**NEVER use grep or rg for code discovery. ALWAYS use pmat query.**

```bash
# Find functions by intent
pmat query "pipeline transpilation" --limit 10

# Find high-quality code
pmat query "oracle recommendation" --min-grade A --exclude-tests

# Find with fault annotations (unwrap, panic, unsafe)
pmat query "backend dispatch" --faults

# Filter by complexity
pmat query "stack dependency" --max-complexity 15

# Cross-project search
pmat query "simd kernel" --include-project ../trueno
pmat query "training loop" --include-project ../entrenar

# Include source code in results
pmat query "bug hunter" --include-source --limit 5

# Git history search (find code by commit intent via RRF fusion)
pmat query "fix clippy warnings" -G
pmat query "pipeline refactor" --git-history

# Enrichment flags (combine freely)
pmat query "oracle dispatch" --churn              # git volatility (commit count, churn score)
pmat query "stack dependency" --duplicates         # code clone detection (MinHash+LSH)
pmat query "recipe handler" --entropy              # pattern diversity (repetitive vs unique)
pmat query "bug hunter" --churn --duplicates --entropy --faults -G  # full audit
```

# Documentation
make book                      # Build mdBook
make book-serve                # Serve at localhost:3000
cargo doc --no-deps --open     # API docs
```

## Architecture

### Core Modules

- **`src/pipeline.rs`**: 5-phase transpilation pipeline (Analysis → Transpilation → Optimization → Validation → Build) with Jidoka stop-on-error validation
- **`src/backend.rs`**: Cost-based GPU/SIMD/Scalar selection using 5× PCIe rule (Gregg & Hazelwood, 2011)
- **`src/oracle/`**: Knowledge graph for stack component recommendations with natural language queries. All code examples (34 cookbook recipes + 5 recommender snippets) include TDD test companions (`#[cfg(test)]` modules). Use `--format code` to get code + test companions.
- **`src/serve/`**: Model serving with failover, circuit breakers, privacy tiers (Sovereign/Private/Standard)
- **`src/stack/`**: Dependency graph management, release orchestration, quality gates across stack components

### ML Converters

- **`src/numpy_converter.rs`**: NumPy → Trueno operation mapping
- **`src/sklearn_converter.rs`**: scikit-learn → Aprender algorithm mapping
- **`src/pytorch_converter.rs`**: PyTorch → Realizar operation mapping (inference-only)

### Feature Flags

- `native` (default): Full CLI, filesystem, tracing, TUI dashboard
- `wasm`: Browser-compatible build (no filesystem, in-memory analysis)
- `trueno-integration`: SIMD/GPU tensor operations
- `oracle-mode`: Knowledge graph with trueno-graph and trueno-db

### External Tool Integration

Batuta orchestrates external transpilers detected via PATH:
- **Depyler**: Python → Rust
- **Bashrs**: Shell → Rust
- **Decy**: C/C++ → Rust
- **PMAT**: Quality analysis and TDG scoring

## Design Principles

Toyota Production System principles applied:
- **Jidoka**: Stop-on-error in pipelines, automatic failover
- **Poka-Yoke**: Privacy tiers prevent data leakage
- **Heijunka**: Load leveling via spillover routing
- **Muda**: Cost circuit breakers prevent waste
- **Kaizen**: Continuous optimization via MoE backend selection

## LAYOUT-002: Row-Major Mandate (Stack-Wide Policy)

**The entire Sovereign AI Stack uses ROW-MAJOR tensor layout. GGUF column-major data is transposed at import.**

This is a critical architectural decision that affects aprender, realizar, and all model conversion pipelines.

### Layout Architecture

```
External Formats                    Stack Internal (Row-Major)
────────────────                    ──────────────────────────
SafeTensors (row-major) ──────────► APR v2 ──► realizar ──► output
                         (native)       ↑
GGUF (column-major) ─────────────────┘
                    (transposed by aprender)
```

### Why Row-Major?

1. **PyTorch/SafeTensors compatibility** - Native HuggingFace format is row-major
2. **Cache efficiency** - Row-major matches C memory layout (contiguous rows)
3. **Kernel simplicity** - realizar's fused Q4K/Q6K kernels expect row-major

### Implementation

| Component | Responsibility |
|-----------|----------------|
| **aprender** | Transposes GGUF→row-major during `apr import` |
| **realizar** | Assumes row-major, uses fused_q4k_parallel_matvec |
| **trueno** | Provides both colmajor/rowmajor kernels (use row-major for APR) |

### Garbage Output = Layout Bug

If you see output like `"olumbia+lsi nunca/localENTS"` instead of coherent text:
- **Root cause**: Column-major data fed to row-major kernel
- **Fix**: Ensure GGUF was converted through aprender's converter
- **Documentation**: See `aprender/CLAUDE.md` LAYOUT-002 section

## Quality Standards

- 95% minimum test coverage (90% enforced, 95% preferred)
- Zero clippy warnings (with `-D warnings`)
- Mutation testing target: >80% mutation score
- TDG Score: maintain A grade (≥85)
- Pre-commit checks must complete in <30s

## Sovereign AI Stack Ecosystem

### Checking for Updates

```bash
# Check latest versions of all PAIML stack crates
make stack-versions              # or: batuta stack versions

# JSON output for tooling
make stack-versions-json         # or: batuta stack versions --format json

# Check local vs crates.io
make stack-outdated

# Update dependencies
cargo update trueno aprender realizar pacha renacer
```

### Publish Status (O(1) Cached)

```bash
# Check which crates need publishing - O(1) with cache
make stack-publish-status        # or: batuta stack publish-status

# Force refresh (cold cache)
make stack-publish-status-refresh

# Performance:
# - Cold cache: ~7s (parallel crates.io fetches)
# - Warm cache: <100ms (hash-based invalidation)
```

Cache invalidation triggers:
- Cargo.toml content changed
- Git HEAD moved (new commit)
- crates.io TTL expired (15 min)

### Stack Components (crates.io)

| Layer | Crate | Version | Purpose |
|-------|-------|---------|---------|
| Compute | `trueno` | **0.14.x** | SIMD/GPU primitives (AVX2/AVX-512/NEON, wgpu, LZ4) |
| Compute | `trueno-db` | 0.3.x | GPU-first analytics database, SQL interface |
| Compute | `trueno-graph` | 0.1.x | Graph database for code analysis |
| Compute | `trueno-rag` | 0.1.x | RAG pipeline (chunking, BM25+vector, RRF) |
| Compute | `trueno-viz` | 0.1.x | Terminal/PNG visualization |
| Compression | `trueno-zram-core` | 0.3.x | SIMD compression (LZ4/ZSTD, AVX2/AVX-512/NEON, CUDA) |
| Block Device | `trueno-ublk` | 0.1.x | GPU-accelerated ZRAM replacement via ublk |
| Distribution | `repartir` | 2.0.x | Distributed compute (CPU/GPU/Remote, work-stealing) |
| ML | `aprender` | **0.24.x** | ML algorithms, APR v2 format (LZ4/ZSTD compression) |
| Training | `entrenar` | 0.5.x | Autograd, LoRA/QLoRA, quantization, model merge, CITL |
| Inference | `realizar` | **0.5.x** | APR v2/GGUF/SafeTensors inference, GPU kernels |
| Speech | `whisper-apr` | 0.1.x | Pure Rust Whisper ASR (WASM-first, Int4/Int8 quant) |
| Simulation | `simular` | 0.1.x | Unified simulation (Monte Carlo, physics, optimization) |
| Games | `jugar` | 0.1.x | Game engine (ECS, physics, AI, render, audio, WASM) |
| Education | `profesor` | 0.1.x* | Educational platform (courses, quizzes, labs) |
| Data | `alimentar` | 0.2.x | Zero-copy Parquet/Arrow data loading |
| Registry | `pacha` | 0.1.x | Model registry with Ed25519 signatures |
| Tracing | `renacer` | 0.7.x | Syscall tracer with source correlation |
| Quality | `apr-qa` | 0.1.x | APR model QA playbook (test gen, runner, reports) |
| Quality | `provable-contracts` | 0.1.x | YAML contract → Kani verification for ML kernels |
| Quality | `tiny-model-ground-truth` | 0.1.x | Popperian falsification for model conversion parity |
| Transpilers | `depyler`, `bashrs`, `decy` | - | Python/Shell/C → Rust |
| Orchestration | `batuta` | 0.6.x | Stack coordination and CLI |

*Not yet published to crates.io

### APR v2 Model Format

The `.apr` format is the stack's native model serialization:

| Feature | APR v1 | APR v2 |
|---------|--------|--------|
| Tensor Compression | None | LZ4/ZSTD |
| Index Format | JSON | Binary |
| Zero-Copy Loading | Partial | Full |
| Quantization | Int8 | Int4/Int8 |
| Streaming | No | Yes |

```rust
// APR v2 with compression
use aprender::apr::{AprModel, Compression};

let model = AprModel::load_compressed("model.apr", Compression::Lz4)?;
```

### repartir Feature Flags

| Feature | Purpose |
|---------|---------|
| `cpu` (default) | Local multi-core execution with work-stealing |
| `gpu` | wgpu GPU compute (Vulkan/Metal/DX12/WebGPU) |
| `remote` | TCP-based distributed execution across machines |
| `remote-tls` | TLS-secured remote execution |
| `tensor` | trueno SIMD tensor integration |
| `checkpoint` | trueno-db + Parquet state persistence |
| `tui` | Job flow TUI visualization |
| `full` | All features enabled |

### Stack Quality Metrics (PMAT)

| Crate | Files | Functions | Health | Complexity | Coverage |
|-------|-------|-----------|--------|------------|----------|
| `jugar` | 104 | 429 | 68.3% | 50/100 | 65% |
| `simular` | 47 | 88 | 70.0% | 55/100 | 65% |
| `realizar` | 79 | 446 | 68.3% | 50/100 | 65% |
| `aprender` | 331 | 1008 | 68.3% | 50/100 | 65% |
| `entrenar` | 253 | 3087 | 68.3% | 50/100 | 65% |
| `profesor` | 24 | 53 | 83.3% | 95/100 | 65% |
| `provable-contracts` | - | - | - | - | - |
| `tiny-model-ground-truth` | - | - | - | - | - |

### Staying Current

My knowledge has a cutoff date. To get latest stack features:

```bash
# Fetch latest from crates.io (cached 15 min)
batuta stack versions

# Check docs.rs for API changes
# https://docs.rs/trueno, https://docs.rs/aprender, etc.

# RSS feeds for releases
# https://crates.io/api/v1/crates/{crate}/versions.rss
```

## Key Dependencies

- **trueno**: SIMD/GPU compute with LZ4 compression (0.11.x)
- **repartir**: Distributed compute with CPU/GPU/Remote executors (2.0.x)
- **aprender**: ML algorithms with APR v2 format, LZ4/ZSTD compression (0.24.x)
- **realizar**: Inference engine with APR v2, GPU kernels (0.5.x)
- **whisper-apr**: Pure Rust Whisper ASR, WASM-first (0.1.x)
- **trueno-zram-core**: SIMD/GPU memory compression (0.3.x)
- **trueno-ublk**: GPU-accelerated block device via ublk (0.1.x)
- **entrenar**: Training with autograd, LoRA/QLoRA, CITL (0.5.x)
- **simular**: Simulation engine with Jidoka guards, Heijunka scheduling (0.1.x)
- **jugar**: Game engine with ECS, physics, AI, WASM support (0.1.x)
- **profesor**: Educational platform with quizzes, labs (0.1.x, not on crates.io)
- **renacer**: Syscall tracing for semantic validation (0.9.x)
- **pacha**: Model registry integration (0.2.x)
- **alimentar**: Data loading with Parquet/Arrow (0.2.x)

### Stack Inter-dependencies

```
whisper-apr ► trueno (0.11), aprender (0.24), realizar (0.5)
realizar ───► trueno (0.11), aprender (0.24), alimentar (0.2), pacha (0.2)
aprender ───► trueno (0.11), alimentar (0.2), entrenar (0.5)
entrenar ───► trueno (0.11), aprender (0.24), trueno-db, trueno-rag
trueno-zram-core ► trueno (0.11), CUDA optional
trueno-ublk ► trueno-zram-core, trueno-zram-adaptive, libublk
repartir ───► trueno (0.6+), trueno-db (checkpoint), wgpu (gpu)
jugar ──────► trueno (0.11), aprender (0.24)
simular ────► jugar-probar (testing)
profesor ───► (no_std, minimal deps for WASM)
```

### GPU Kernel Capabilities (realizar)

| Kernel | Purpose |
|--------|---------|
| `GemmKernel` | Matrix multiplication (naive, tiled, tensor core) |
| `AttentionKernel` | FlashAttention-style tiled attention |
| `SoftmaxKernel` | Numerically stable with warp shuffle |
| `LayerNormKernel` | Fused layer normalization |
| `QuantizeKernel` | Q4_K dequantization fused with matmul |
| `Q5KKernel` | Q5_K dequantization |
| `Q6KKernel` | Q6_K dequantization |

### trueno-zram Compression

| Algorithm | Throughput | Use Case |
|-----------|------------|----------|
| LZ4 | 3+ GB/s | High-speed, general purpose |
| ZSTD | 13 GB/s (AVX-512) | Better ratio, compressible data |
| Same-Fill | 2048:1 | Zero/repeated pages |

```rust
use trueno_zram_core::{CompressorBuilder, Algorithm};

let compressor = CompressorBuilder::new()
    .algorithm(Algorithm::Lz4)
    .build()?;

let compressed = compressor.compress(&page)?;
```

### Distributed Computing with repartir

```bash
# Run distributed computing example
cargo run --example repartir_distributed --features distributed

# Start remote worker (on each node)
cargo run --bin repartir-worker --features remote -- --bind 0.0.0.0:9000

# TUI job flow monitor
cargo run --bin job-flow --features tui,remote
```

Multi-machine GPU/SIMD pattern:
```rust
use repartir::{Pool, task::{Task, Backend}};
use repartir::executor::remote::RemoteExecutor;

// Connect to GPU workers across machines
let executor = RemoteExecutor::builder()
    .add_worker("node1:9000")  // GPU node 1
    .add_worker("node2:9000")  // GPU node 2
    .build().await?;

let task = Task::builder()
    .binary("./gpu-workload")
    .backend(Backend::Gpu)
    .build()?;

let result = executor.execute(task).await?;
```

## Project-Specific Commands

```bash
# Stack orchestration
batuta stack check              # Dependency health
batuta stack status             # TUI dashboard
batuta stack versions           # Check crates.io versions
batuta stack quality            # Quality matrix
batuta stack gate               # CI quality gate

# Oracle mode (natural language queries)
batuta oracle "How do I train a model?"
batuta oracle --list            # List all components
batuta oracle --recipe ml-random-forest --format code  # Code + TDD test companion
batuta oracle --cookbook --format code   # All recipes with test companions

# Oracle RAG mode (indexed documentation search)
batuta oracle --rag-index       # Index stack docs + ground truth corpora
batuta oracle --rag "tokenization"  # Search indexed docs

# Oracle PMAT query (function-level quality-annotated search)
batuta oracle --pmat-query "error handling"                    # Search functions
batuta oracle --pmat-query "serialize" --pmat-min-grade A      # Grade filter
batuta oracle --pmat-query "cache" --pmat-max-complexity 10    # Complexity filter
batuta oracle --pmat-query "error" --rag                       # Combined function + doc search
batuta oracle --pmat-query "alloc" --pmat-include-source       # Include source code

# Analysis
batuta analyze --languages --tdg .
```

## Ground Truth Corpora

The Oracle RAG mode indexes external ground truth corpora for cross-language knowledge:

### HuggingFace Ground Truth Corpus

Location: `../hf-ground-truth-corpus`

A curated collection of production-ready Python recipes for HuggingFace ML workflows:
- **95%+ test coverage** with property-based testing (Hypothesis)
- **Module structure**: `hf_gtc.hub`, `hf_gtc.inference`, `hf_gtc.preprocessing`, `hf_gtc.training`
- **Cross-references**: Maps Python patterns to Rust equivalents (candle/trueno)

Oracle query examples:
```bash
batuta oracle --rag "How do I tokenize text for BERT?"
# Returns: hf_gtc/preprocessing/tokenization.py + candle equivalent

batuta oracle --rag "sentiment analysis pipeline"
# Returns: hf_gtc/inference/pipelines.py patterns
```

### TGI Ground Truth Corpus

Location: `../tgi-ground-truth-corpus`

Production-ready Rust patterns for LLM inference serving, adapted from HuggingFace TGI:
- **Inference patterns**: continuous batching, KV cache, speculative decoding
- **Quantization**: Q4/Q5/Q6 kernels, calibration strategies
- **Serving**: router, scheduler, streaming SSE, request validation
- **Integration**: Maps TGI patterns to Sovereign AI Stack (realizar)

Oracle query examples:
```bash
batuta oracle --rag "continuous batching implementation"
# Returns: tgi-ground-truth-corpus/src/batching.rs + book/patterns/batching.md

batuta oracle --rag "KV cache optimization"
# Returns: tgi-ground-truth-corpus/src/kv_cache.rs patterns
```

### Databricks Ground Truth Corpus

Location: `../databricks-ground-truth-corpus`

Popperian falsification corpus for Databricks open-source projects:
- **Methodology**: Attempt to break, not verify (129/322 tests passing)
- **Domains**: SDK parity, MegaBlocks MoE, Lilac data quality, Spark extensions, Benchmarks
- **Test signals**: PII detection, dedup, language ID, text statistics, pandas API parity
- **Integration**: Validates Databricks ecosystem patterns

Oracle query examples:
```bash
batuta oracle --rag "PII detection patterns"
# Returns: lilac/scripts/test_pii_detection.py patterns

batuta oracle --rag "MinHash near-duplicate detection"
# Returns: lilac/scripts/test_dedup_detection.py implementation
```

### Ludwig Ground Truth Corpus

Location: `../ludwig-ground-truth-corpus`

Popperian falsification corpus for Ludwig declarative deep learning framework:
- **Methodology**: Attempt to break, not verify (~280 pre-registered tests)
- **Domains**: Config validation, feature encoding, preprocessing, ECD architecture,
  training determinism, serving fidelity, LLM fine-tuning
- **Test signals**: Shape invariants, normalization properties, LoRA math, loss monotonicity
- **Integration**: Validates declarative ML patterns (encoder-combiner-decoder)

Oracle query examples:
```bash
batuta oracle --rag "Ludwig config validation"
# Returns: config-validation/scripts/test_config_validation.py patterns

batuta oracle --rag "LoRA adapter properties"
# Returns: llm-fine-tuning/scripts/test_lora_properties.py implementation
```

### Tiny Model Ground Truth Corpus

Location: `../tiny-model-ground-truth`

Popperian falsification test suite for model conversion parity:
- **Methodology**: Generate HuggingFace oracle outputs, validate against realizar inference
- **Domains**: GGUF/SafeTensors/APR format conversions, quantization drift
- **Test signals**: Output parity, KL divergence, roundtrip fidelity
- **Integration**: Validates realizar and aprender conversion pipelines

Oracle query examples:
```bash
batuta oracle --rag "model conversion parity"
# Returns: tiny-model-ground-truth oracle generation patterns

batuta oracle --rag "quantization drift measurement"
# Returns: tiny-model-ground-truth drift validation tests
```

### Extending Ground Truth

To add new ground truth corpora:
1. **Rust corpora**: Add to `rust_corpus_dirs` in `src/cli/oracle/rag_index.rs:IndexConfig::new()`
2. **Python corpora**: Add to `python_corpus_dirs` in the same function
3. Ensure corpus has CLAUDE.md and README.md for P0/P1 indexing
4. Source in `src/**/*.rs` or `src/**/*.py` is indexed as P2
5. mdBook docs in `book/src/**/*.md` are indexed as P1
6. Run `batuta oracle --rag-index` to rebuild index

### Private Repos (`.batuta-private.toml`)

For private intellectual property that should be discoverable via RAG but never committed to GitHub, create a `.batuta-private.toml` file at the project root (git-ignored):

```toml
[private]
rust_stack_dirs = [
    "../rmedia",
    "../infra",
]

rust_corpus_dirs = [
    "../internal-cookbook",
]

python_corpus_dirs = []
```

Private directories are merged into the standard index at runtime. The CLI shows a confirmation:

```
Private: 2 private directories merged from .batuta-private.toml
```

- Missing file: silently ignored (no warning)
- Malformed TOML: warning printed, indexing continues without private dirs
- Empty `[private]` section: no-op
- Nonexistent directories: handled gracefully at scan time

## Claude Code Integration

This project includes `.claude/commands/` for quick access to common tasks:

- `/stack-versions` - Check latest PAIML crate versions from crates.io
- `/stack-check` - Run dependency health check
- `/quality` - Run full quality gate (fmt, clippy, test, coverage)
- `/update-deps` - Check and apply stack dependency updates

These commands provide pre-configured workflows for maintaining the Sovereign AI Stack.