# RLM Oracle System
Deterministic oracle system for validating RLM REPL trace outputs (FINAL() answers) to enable synthetic training data generation for the 90M BitNet distilled navigation model.
## Architecture
```
RLM Analysis Result
↓
Query Classification (pattern-match / structural / semantic)
↓
┌───────────────────┐
│ Pattern-Match? │──→ Grep Oracle
└───────────────────┘ ↓
┌───────────────────┐ Verify against
│ Structural? │──→ Tree-sitter Oracle actual grep/AST
└───────────────────┘ ↓
┌───────────────────┐ ┌────────────┐
│ Semantic? │──→ │ Unverified│
└───────────────────┘ └────────────┘
↓
┌──────────────┐
│ Golden │ → JSONL
│ (verified) │
└──────────────┘
┌──────────────┐
│ Failed │ → Discard/Review
│ (disagrees) │
└──────────────┘
```
## Components
### 1. Grep Oracle (`src/rlm/oracle/grep_oracle.rs`)
**Purpose**: Verify pattern-match queries by comparing claimed results against actual `grep -n` output.
**Supported Queries**:
- "Find all async functions"
- "List all structs matching pattern X"
- "Count occurrences of Y"
- "Find all error handling patterns"
**Verification Results**:
- `ExactMatch` - Answer matches ground truth exactly
- `UnorderedMatch` - Answer matches but in different order
- `SubsetMatch` - Answer is a subset of ground truth
- `HasFalsePositives` - Answer contains claims not in source
- `HasFalseNegatives` - Answer is missing items from source
- `Mismatch` - Answer is completely different
**Example**:
```rust
use codetether_agent::rlm::oracle::GrepOracle;
let oracle = GrepOracle::new(source_code);
let result = oracle.verify(
"5:async fn foo()\n10:async fn bar()",
"Find all async functions"
);
match result {
GrepVerification::ExactMatch => println!("✓ Verified!"),
_ => println!("✗ Verification failed"),
}
```
### 2. Tree-sitter Oracle (`src/rlm/oracle/tree_sitter_oracle.rs`)
**Purpose**: Verify structural queries using AST parsing.
**Supported Queries**:
- Function signatures (name, args, return type)
- Struct/enum definitions and field listings
- Impl blocks and trait implementations
- Error handling patterns (Result, match arms, ? operator)
**API**:
```rust
use codetether_agent::rlm::oracle::TreeSitterOracle;
let mut oracle = TreeSitterOracle::new(source_code);
// Get all functions
let functions = oracle.get_functions()?;
for func in functions {
println!("{}: {} ({})", func.line, func.name, func.params);
}
// Execute custom AST query
let result = oracle.query("(function_item name: (identifier) @name)")?;
// Get struct fields
let structs = oracle.get_structs()?;
// Count error patterns
let errors = oracle.count_error_patterns()?;
```
**New DSL Command**: `ast_query()`
```
ast_query("(function_item name: (identifier) @name)")
```
Returns formatted AST matches with line numbers and captures.
### 3. Trace Validator (`src/rlm/oracle/validator.rs`)
**Purpose**: Orchestrates validation by routing queries to appropriate oracles and outputting golden traces.
**Usage**:
```rust
use codetether_agent::rlm::oracle::{TraceValidator, OracleResult};
let validator = TraceValidator::new();
let result = validator.validate(&analysis_result, &source_code, Some("path/to/file.rs"));
match result {
OracleResult::Golden(trace) => {
// Write to JSONL for SFT training
println!("Golden trace: {}", trace.trace_id);
}
OracleResult::Unverified { reason } => {
println!("No oracle available: {}", reason);
}
OracleResult::Failed { reason, trace } => {
println!("Verification failed: {}", reason);
}
}
```
**Batch Validation**:
```rust
let stats = validator.batch_validate(traces);
println!("Golden rate: {:.1}%", stats.golden_rate() * 100.0);
stats.write_jsonl("golden_traces.jsonl")?;
```
### 4. Context Tracer (`src/rlm/context_trace.rs`)
**Purpose**: Track token budget per RLM iteration.
**Events Traced**:
- `SystemPrompt` - Initial system message
- `GrepResult` - Grep operation results
- `LlmQueryResult` - Sub-LLM call results
- `AssistantCode` - Code generated by assistant
- `ExecutionOutput` - Code execution output
- `Final` - Final answer
- `ToolCall` - Structured tool calls
- `ToolResult` - Tool call results
**Usage**:
```rust
use codetether_agent::rlm::context_trace::{ContextTrace, ContextEvent};
let mut trace = ContextTrace::new(8000); // 8k token budget
trace.log_event(ContextEvent::SystemPrompt {
content: system_prompt,
tokens: 200,
});
trace.log_event(ContextEvent::GrepResult {
pattern: "async fn".to_string(),
matches: 5,
tokens: 150,
});
let summary = trace.summary();
println!("Budget used: {:.1}%", summary.budget_used_percent);
```
## Integration Points
### RLM REPL
The oracle system is integrated into the RLM REPL at several points:
1. **Tool Definitions** (`tools.rs`):
- Added `rlm_ast_query` tool for AST queries
- Tool dispatcher handles `ast_query` command
2. **DSL Commands** (`repl.rs`):
- `ast_query("s-expr")` - Execute tree-sitter queries
- Updated help text to include AST query
3. **Module Exports** (`mod.rs`):
- Exported oracle types and validators
- Exported context trace types
### Training Data Generation
Golden traces are output as JSONL with the following structure:
```json
{
"query": "Find all async functions",
"answer": "5:async fn foo()\n10:async fn bar()",
"iterations": 2,
"subcalls": 0,
"input_tokens": 150,
"output_tokens": 80,
"elapsed_ms": 500,
"source_path": "src/main.rs",
"verification_method": "GrepOracle",
"timestamp": 1234567890,
"trace_id": "uuid-here"
}
```
## Query Type Classification
| "find all X" | PatternMatch | Grep |
| "list all X" | PatternMatch | Grep |
| "count X" | PatternMatch | Grep |
| "search for X" | PatternMatch | Grep |
| "X signature" | Structural | Tree-sitter |
| "parameters of X" | Structural | Tree-sitter |
| "fields of X" | Structural | Tree-sitter |
| "implements X" | Structural | Tree-sitter |
| "explain X" | Semantic | None (unverified) |
| "why does X" | Semantic | None (unverified) |
## Performance Considerations
- **Grep Oracle**: O(n) where n is file size, uses regex engine
- **Tree-sitter Oracle**: O(n) parse once, O(m) query where m is AST size
- **Context Trace**: O(1) for logging, circular buffer of 1000 events max
## Testing
Run oracle tests:
```bash
cargo test oracle
cargo test context_trace
```
Run integration tests:
```bash
cargo test rlm::oracle
```
## Future Work
- **Self-consistency oracle**: For semantic queries, run 3x inference consensus
- **Execution oracle**: Run generated tests (security considerations needed)
- **Multi-file oracle**: Support cross-file queries
- **Incremental verification**: Cache parsed ASTs between runs