codetether-agent 4.0.0

# RLM Oracle System

Deterministic oracle system for validating RLM REPL trace outputs (FINAL() answers) to enable synthetic training data generation for the 90M BitNet distilled navigation model.

## Architecture

```
RLM Analysis Result
        ↓
Query Classification (pattern-match / structural / semantic)
        ↓
    ┌───────────────────┐
    │  Pattern-Match?   │──→ Grep Oracle
    └───────────────────┘           ↓
    ┌───────────────────┐      Verify against
    │  Structural?      │──→ Tree-sitter Oracle  actual grep/AST
    └───────────────────┘           ↓
    ┌───────────────────┐      ┌────────────┐
    │  Semantic?        │──→   │  Unverified│
    └───────────────────┘      └────────────┘
                                      ↓
                              ┌──────────────┐
                              │    Golden    │ → JSONL
                              │   (verified) │
                              └──────────────┘
                              ┌──────────────┐
                              │    Failed    │ → Discard/Review
                              │  (disagrees) │
                              └──────────────┘
```

## Components

### 1. Grep Oracle (`src/rlm/oracle/grep_oracle.rs`)

**Purpose**: Verify pattern-match queries by comparing claimed results against actual `grep -n` output.

**Supported Queries**:
- "Find all async functions"
- "List all structs matching pattern X"
- "Count occurrences of Y"
- "Find all error handling patterns"

**Verification Results**:
- `ExactMatch` - Answer matches ground truth exactly
- `UnorderedMatch` - Answer matches but in different order
- `SubsetMatch` - Answer is a subset of ground truth
- `HasFalsePositives` - Answer contains claims not in source
- `HasFalseNegatives` - Answer is missing items from source
- `Mismatch` - Answer is completely different

**Example**:
```rust
use codetether_agent::rlm::oracle::GrepOracle;

let oracle = GrepOracle::new(source_code);
let result = oracle.verify(
    "5:async fn foo()\n10:async fn bar()",
    "Find all async functions"
);

match result {
    GrepVerification::ExactMatch => println!("✓ Verified!"),
    _ => println!("✗ Verification failed"),
}
```

### 2. Tree-sitter Oracle (`src/rlm/oracle/tree_sitter_oracle.rs`)

**Purpose**: Verify structural queries using AST parsing.

**Supported Queries**:
- Function signatures (name, args, return type)
- Struct/enum definitions and field listings
- Impl blocks and trait implementations
- Error handling patterns (Result, match arms, ? operator)

**API**:
```rust
use codetether_agent::rlm::oracle::TreeSitterOracle;

let mut oracle = TreeSitterOracle::new(source_code);

// Get all functions
let functions = oracle.get_functions()?;
for func in functions {
    println!("{}: {} ({})", func.line, func.name, func.params);
}

// Execute custom AST query
let result = oracle.query("(function_item name: (identifier) @name)")?;

// Get struct fields
let structs = oracle.get_structs()?;

// Count error patterns
let errors = oracle.count_error_patterns()?;
```

**New DSL Command**: `ast_query()`
```
ast_query("(function_item name: (identifier) @name)")
```

Returns formatted AST matches with line numbers and captures.

### 3. Trace Validator (`src/rlm/oracle/validator.rs`)

**Purpose**: Orchestrates validation by routing queries to appropriate oracles and outputting golden traces.

**Usage**:
```rust
use codetether_agent::rlm::oracle::{TraceValidator, OracleResult};

let validator = TraceValidator::new();
let result = validator.validate(&analysis_result, &source_code, Some("path/to/file.rs"));

match result {
    OracleResult::Golden(trace) => {
        // Write to JSONL for SFT training
        println!("Golden trace: {}", trace.trace_id);
    }
    OracleResult::Unverified { reason } => {
        println!("No oracle available: {}", reason);
    }
    OracleResult::Failed { reason, trace } => {
        println!("Verification failed: {}", reason);
    }
}
```

**Batch Validation**:
```rust
let stats = validator.batch_validate(traces);
println!("Golden rate: {:.1}%", stats.golden_rate() * 100.0);
stats.write_jsonl("golden_traces.jsonl")?;
```

### 4. Context Tracer (`src/rlm/context_trace.rs`)

**Purpose**: Track token budget per RLM iteration.

**Events Traced**:
- `SystemPrompt` - Initial system message
- `GrepResult` - Grep operation results
- `LlmQueryResult` - Sub-LLM call results
- `AssistantCode` - Code generated by assistant
- `ExecutionOutput` - Code execution output
- `Final` - Final answer
- `ToolCall` - Structured tool calls
- `ToolResult` - Tool call results

**Usage**:
```rust
use codetether_agent::rlm::context_trace::{ContextTrace, ContextEvent};

let mut trace = ContextTrace::new(8000); // 8k token budget

trace.log_event(ContextEvent::SystemPrompt {
    content: system_prompt,
    tokens: 200,
});

trace.log_event(ContextEvent::GrepResult {
    pattern: "async fn".to_string(),
    matches: 5,
    tokens: 150,
});

let summary = trace.summary();
println!("Budget used: {:.1}%", summary.budget_used_percent);
```

## Integration Points

### RLM REPL

The oracle system is integrated into the RLM REPL at several points:

1. **Tool Definitions** (`tools.rs`):
   - Added `rlm_ast_query` tool for AST queries
   - Tool dispatcher handles `ast_query` command

2. **DSL Commands** (`repl.rs`):
   - `ast_query("s-expr")` - Execute tree-sitter queries
   - Updated help text to include AST query

3. **Module Exports** (`mod.rs`):
   - Exported oracle types and validators
   - Exported context trace types

### Training Data Generation

Golden traces are output as JSONL with the following structure:

```json
{
  "query": "Find all async functions",
  "answer": "5:async fn foo()\n10:async fn bar()",
  "iterations": 2,
  "subcalls": 0,
  "input_tokens": 150,
  "output_tokens": 80,
  "elapsed_ms": 500,
  "source_path": "src/main.rs",
  "verification_method": "GrepOracle",
  "timestamp": 1234567890,
  "trace_id": "uuid-here"
}
```

## Query Type Classification

| Pattern | Type | Oracle |
|---------|------|--------|
| "find all X" | PatternMatch | Grep |
| "list all X" | PatternMatch | Grep |
| "count X" | PatternMatch | Grep |
| "search for X" | PatternMatch | Grep |
| "X signature" | Structural | Tree-sitter |
| "parameters of X" | Structural | Tree-sitter |
| "fields of X" | Structural | Tree-sitter |
| "implements X" | Structural | Tree-sitter |
| "explain X" | Semantic | None (unverified) |
| "why does X" | Semantic | None (unverified) |

## Performance Considerations

- **Grep Oracle**: O(n) where n is file size, uses regex engine
- **Tree-sitter Oracle**: O(n) parse once, O(m) query where m is AST size
- **Context Trace**: O(1) for logging, circular buffer of 1000 events max

## Testing

Run oracle tests:
```bash
cargo test oracle
cargo test context_trace
```

Run integration tests:
```bash
cargo test rlm::oracle
```

## Future Work

- **Self-consistency oracle**: For semantic queries, run 3x inference consensus
- **Execution oracle**: Run generated tests (security considerations needed)
- **Multi-file oracle**: Support cross-file queries
- **Incremental verification**: Cache parsed ASTs between runs