# AGENTS.md
Instructions for AI coding agents working on CodeTether Agent.
## Setup Commands
```bash
# Install dependencies and build
cargo build
# Run tests
cargo test
# Run with clippy lints
cargo clippy --all-features
# Build release binary
cargo build --release
# Install locally
cargo install --path .
# Run the TUI
codetether tui
# Run a single prompt
codetether run "your message"
# Run as A2A worker (connects to server, registers codebases)
codetether worker --server http://localhost:8001 --codebases /path/to/project --auto-approve safe
# Deploy as a service (one-command wrapper)
./deploy-worker.sh --codebases /path/to/project
```
## Code Style
- Rust 2024 edition with `rust-version = "1.85"`
- Use `anyhow::Result` for fallible functions in application code
- Use `thiserror` for library error types
- Prefer `tracing` over `println!` for logging
- Always use `tracing::info!`, `tracing::warn!`, `tracing::error!` with structured fields, e.g., `tracing::info!(tool = %name, "Executing tool")`
- Inline format args when possible: `format!("{name}")` not `format!("{}", name)`
- Collapse if statements per clippy `collapsible_if`
- Use method references over closures when possible: `.map(String::as_str)` not `.map(|s| s.as_str())`
- Prefer exhaustive `match` statements over wildcard arms when the enum is small
## Project Structure
```
src/
├── main.rs # CLI entry point (clap-based)
├── lib.rs # Library root - re-exports all modules
├── a2a/ # A2A protocol client/server/worker
├── agent/ # Agent definitions (builtin agents)
├── audit/ # System-wide audit trail (JSON Lines, queryable)
│ └── mod.rs # AuditEntry, AuditLog, global singleton, query filters
├── cli/ # CLI commands (run, tui, serve, ralph, swarm, etc.)
├── config/ # Configuration loading (includes LSP/linter server settings under [lsp])
├── k8s/ # Kubernetes self-deployment manager
│ └── mod.rs # K8sManager, cluster detection, pod lifecycle, reconcile loop
├── mcp/ # MCP protocol implementation
├── provider/ # LLM provider implementations
├── ralph/ # Autonomous PRD-driven development loop
├── rlm/ # Recursive Language Model processing
├── secrets/ # HashiCorp Vault secrets management
├── server/ # HTTP server (axum)
│ ├── mod.rs # Routes, middleware, AppState, policy middleware
│ ├── auth.rs # Mandatory Bearer token auth (cannot be disabled)
│ └── policy.rs # OPA policy client (local eval + HTTP)
├── session/ # Conversation session management
├── swarm/ # Parallel sub-agent orchestration
├── tool/ # Tool implementations (27+ tools)
│ ├── mod.rs # Tool registry
│ └── sandbox.rs # Plugin sandboxing, Ed25519 signing, resource limits
├── tui/ # Terminal UI (ratatui + crossterm)
└── worktree/ # Git worktree management for isolation
```
## Adding a New Tool
1. Create `src/tool/your_tool.rs`:
```rust
use super::{Tool, ToolDefinition, ToolResult};
use anyhow::Result;
use async_trait::async_trait;
use serde::{Deserialize, Serialize};
use serde_json::Value;
#[derive(Debug, Clone, Serialize, Deserialize)]
pub struct YourToolInput {
pub required_field: String,
#[serde(default)]
pub optional_field: Option<String>,
}
pub struct YourTool;
#[async_trait]
impl Tool for YourTool {
fn name(&self) -> &'static str {
"your_tool"
}
fn definition(&self) -> ToolDefinition {
ToolDefinition {
name: self.name().to_string(),
description: "What the tool does".to_string(),
parameters: serde_json::json!({
"type": "object",
"properties": {
"required_field": {
"type": "string",
"description": "Description of this field"
},
"optional_field": {
"type": "string",
"description": "Optional field description"
}
},
"required": ["required_field"]
}),
}
}
async fn execute(&self, input: Value) -> Result<ToolResult> {
let params: YourToolInput = serde_json::from_value(input)?;
// Tool implementation here
Ok(ToolResult {
output: "Result text".to_string(),
success: true,
})
}
}
```
2. Register in `src/tool/mod.rs`:
- Add `pub mod your_tool;`
- Add to `ToolRegistry::new()` or `ToolRegistry::with_provider_arc()`
## Adding a New Provider
1. Create `src/provider/your_provider.rs` implementing `Provider` trait:
```rust
#[async_trait]
impl Provider for YourProvider {
fn name(&self) -> &str { "yourprovider" }
fn models(&self) -> Vec<String> { vec!["model-1".into()] }
async fn complete(&self, request: CompletionRequest) -> Result<CompletionResponse>;
async fn stream(&self, request: CompletionRequest) -> Result<Pin<Box<dyn Stream<Item = Result<StreamChunk>>>>>;
}
```
2. Register in `src/provider/mod.rs` within `ProviderRegistry::from_vault()`
3. Add Vault secret path: `secret/codetether/providers/yourprovider`
## Secrets Management
**CRITICAL**: Never hardcode API keys. All secrets come from HashiCorp Vault.
```rust
// Good - load from Vault
let registry = ProviderRegistry::from_vault().await?;
// Bad - never do this
let api_key = "sk-..."; // NEVER
```
Environment variables for Vault:
- `VAULT_ADDR` - Vault server address
- `VAULT_TOKEN` - Authentication token
- `VAULT_MOUNT` - KV mount (default: `secret`)
- `VAULT_SECRETS_PATH` - Path prefix (default: `codetether/providers`)
### Security-Hardened Deployments
By default, `from_vault()` falls back to environment variables (`OPENAI_API_KEY`, etc.)
for local development convenience. For production or security-conscious deployments,
disable this fallback to ensure all credentials come exclusively from Vault:
```bash
# Disable environment variable fallback (security-hardened mode)
export CODETETHER_DISABLE_ENV_FALLBACK=1
codetether serve
```
When set, only Vault-configured providers will be available. This prevents accidental
credential leakage via environment variables which may be exposed in process listings,
logs, or container inspection.
## Testing
```bash
# Run all tests
cargo test
# Run tests for a specific module
cargo test --lib session
# Run a specific test
cargo test test_name
# Run with output
cargo test -- --nocapture
```
### Test Patterns
- Use `#[tokio::test]` for async tests
- Use `tempfile::tempdir()` for file system tests
- Mock providers for unit tests, use real providers in integration tests
- Integration tests go in `tests/` directory
## Rustdoc & Documentation Standards
> **This is an open-source project.** Every public type, function, and module
> must be documented well enough that a junior developer can use it without
> reading the implementation. When in doubt, over-document.
### Running Doc Tests
```bash
# Run ONLY doc tests (fast, catches broken examples)
cargo test --doc
# Run doc tests for a single module
cargo test --doc session
# Generate HTML docs and open in browser
cargo doc --open --no-deps
```
### Doc Comment Cheat Sheet
Rust doc comments use `///` for items and `//!` for module-level docs.
```rust
//! This is a module-level doc comment.
//!
//! It appears at the top of a file (usually `mod.rs` or `lib.rs`)
//! and describes what the entire module is for.
/// A single-line doc comment for an item below it.
///
/// A longer description goes here. You can use **bold**, *italic*,
/// and [`links to other types`](crate::session::Session).
///
/// # Arguments
///
/// * `name` — Description of the parameter.
///
/// # Returns
///
/// What the function returns and when it errors.
///
/// # Examples
///
/// ```rust
/// let result = 2 + 2;
/// assert_eq!(result, 4);
/// ```
pub fn my_function(name: &str) -> String {
format!("Hello, {name}")
}
```
### Runnable vs Non-Runnable Examples
Rust has **four** doc example modes. Use the right one:
| ` ```rust ` or ` ``` ` | Yes | Yes | **Default. Pure logic, no I/O.** |
| ` ```rust,no_run ` | Yes | No | Compiles but needs network/files at runtime. |
| ` ```rust,ignore ` | No | No | Pseudocode or needs external context. |
| ` ```text ` | No | No | Output examples, diagrams, CLI output. |
**Rule: Prefer runnable (` ``` `) whenever possible.** If the example can't compile
without the rest of the crate, use `no_run`. Only use `ignore` as a last resort.
### Writing Runnable Doc Examples
Runnable examples are real Rust code that `cargo test --doc` compiles and executes.
They act as both documentation AND tests — if the example breaks, CI catches it.
#### Pattern 1: Simple function (fully runnable)
```rust
/// Truncate a string to `max_len` bytes, appending "..." if truncated.
///
/// # Examples
///
/// ```rust
/// use codetether_agent::tui::truncate_str;
///
/// assert_eq!(truncate_str("hello", 10), "hello");
/// assert_eq!(truncate_str("hello world", 8), "hello...");
/// ```
pub fn truncate_str(s: &str, max_len: usize) -> String {
if s.len() <= max_len {
s.to_string()
} else {
let boundary = s.floor_char_boundary(max_len.saturating_sub(3));
format!("{}...", &s[..boundary])
}
}
```
#### Pattern 2: Struct with builder (fully runnable)
```rust
/// Result from executing a tool.
///
/// # Examples
///
/// ```rust
/// use codetether_agent::tool::ToolResult;
///
/// // Success case
/// let ok = ToolResult::success("file written");
/// assert!(ok.success);
/// assert_eq!(ok.output, "file written");
///
/// // Error case
/// let err = ToolResult::error("permission denied");
/// assert!(!err.success);
/// ```
pub struct ToolResult {
pub output: String,
pub success: bool,
}
```
#### Pattern 3: Async function (no_run — needs tokio runtime)
```rust
/// Load a session from disk by its UUID.
///
/// # Examples
///
/// ```rust,no_run
/// # tokio::runtime::Runtime::new().unwrap().block_on(async {
/// use codetether_agent::session::Session;
/// use std::path::Path;
///
/// let session = Session::load(Path::new("/tmp/sessions"), "abc-123")
/// .await
/// .expect("session should exist");
/// println!("Loaded {} messages", session.messages.len());
/// # });
/// ```
```
#### Pattern 4: Error handling (fully runnable)
```rust
/// Parse a tool call ID from a string.
///
/// # Errors
///
/// Returns `Err` if the string is empty or not valid UTF-8.
///
/// # Examples
///
/// ```rust
/// fn parse_id(s: &str) -> Result<String, String> {
/// if s.is_empty() {
/// return Err("ID cannot be empty".into());
/// }
/// Ok(s.to_uppercase())
/// }
///
/// assert_eq!(parse_id("abc").unwrap(), "ABC");
/// assert!(parse_id("").is_err());
/// ```
```
#### Pattern 5: Enum with match (fully runnable)
```rust
/// Outcome of an audited action.
///
/// # Examples
///
/// ```rust
/// use codetether_agent::audit::AuditOutcome;
///
/// let outcome = AuditOutcome::Success;
/// match outcome {
/// AuditOutcome::Success => println!("action succeeded"),
/// AuditOutcome::Failure => println!("action failed"),
/// AuditOutcome::Denied => println!("action denied by policy"),
/// }
/// ```
```
### Hidden Lines in Doc Examples
Use `# ` (hash + space) to hide boilerplate lines. They still compile but
don't show in the rendered docs:
```rust
/// # Examples
///
/// ```rust
/// # use std::collections::HashMap;
/// # fn main() {
/// let mut map = HashMap::new();
/// map.insert("key", 42);
/// assert_eq!(map["key"], 42);
/// # }
/// ```
```
The user sees:
```rust
let mut map = HashMap::new();
map.insert("key", 42);
assert_eq!(map["key"], 42);
```
But `cargo test --doc` compiles the full version with imports and `fn main()`.
### Required Doc Sections
Every public item **must** have at minimum:
| Module (`//!`) | Purpose, key types, usage overview |
| Struct | Purpose, `# Examples` with construction |
| Enum | Purpose, variants list, `# Examples` with match |
| Function | Purpose, `# Arguments`, `# Returns`, `# Examples` |
| Trait | Purpose, `# Implementors` or `# Examples` |
| Method | One-line summary + `# Examples` if non-obvious |
### When to Use `# Errors` and `# Panics`
```rust
/// # Errors
///
/// Returns [`anyhow::Error`] if:
/// - The session file does not exist
/// - The JSON is malformed
///
/// # Panics
///
/// Panics if `max_retries` is zero (this is a programming error).
```
**Rule:** Document `# Errors` for every function returning `Result`.
Document `# Panics` for every function that can panic.
### Linking to Other Types
Use intra-doc links so docs stay valid even if modules move:
```rust
/// Sends a message through the [`Session`] and records it
/// in the [`AuditLog`](crate::audit::AuditLog).
///
/// See also: [`ToolResult::success`]
```
### Module-Level Docs
Every `mod.rs` must start with `//!` docs:
```rust
//! # Session Management
//!
//! This module handles conversation persistence, message history,
//! and session lifecycle (create, load, save, list, delete).
//!
//! ## Quick Start
//!
//! ```rust,no_run
//! # tokio::runtime::Runtime::new().unwrap().block_on(async {
//! use codetether_agent::session::Session;
//! use std::path::Path;
//!
//! // Create a new session
//! let mut session = Session::new(Path::new("./sessions"));
//! session.add_user_message("Hello!");
//! session.save().await.unwrap();
//! # });
//! ```
//!
//! ## Architecture
//!
//! Sessions are stored as JSON files in the sessions directory.
//! Each session has a UUID, a list of messages, and metadata.
```
### CI Enforcement
Doc tests run in CI alongside unit tests. A broken doc example **blocks the PR**.
```bash
# This is what CI runs:
cargo test --doc # All doc examples must pass
cargo doc --no-deps 2>&1 # No rustdoc warnings allowed
```
### Common Mistakes
1. **Using `ignore` when `no_run` works** — If the code compiles, use `no_run`
so the compiler still checks it. `ignore` lets examples silently rot.
2. **Forgetting `use` imports** — Doc examples run in isolation. You must
`use codetether_agent::module::Type` even for crate-internal types.
3. **Missing `# tokio::runtime::...` wrapper** — Async examples need a runtime.
Use the hidden-line trick to wrap in `block_on` without cluttering the docs.
4. **No `assert!` in examples** — Examples without assertions are just pretty
printing. Add `assert_eq!` or `assert!` to make them actual tests.
5. **Stale examples after refactoring** — `cargo test --doc` catches this.
Run it locally before pushing.
## TUI Development
The TUI uses ratatui 0.30.0 + crossterm 0.29.0.
### Styling Conventions
```rust
// Good - use Stylize helpers
"text".dim()
"text".cyan().bold()
vec!["prefix".dim(), "content".into()].into()
// Avoid - verbose style construction
Span::styled("text", Style::default().fg(Color::Cyan))
```
### Color Guidelines
- User messages: default foreground
- Assistant messages: Cyan (not Blue - better readability)
- System/status: Dim
- Errors: Red
- Success: Green
### Key Bindings
- `Ctrl+C` / `Ctrl+Q`: Quit
- `?`: Help
- `Tab`: Switch agent
- `↑↓`: Navigate/scroll
- `Enter`: Submit input
## Swarm Sub-Agents
When implementing swarm sub-agents:
1. **Filter interactive tools**: Sub-agents must be autonomous - filter out `question` tool:
```rust
.filter(|t| t.name != "question")
```
2. **Worktree isolation**: Use `inject_workspace_stub()` for Cargo workspace isolation:
```rust
mgr.inject_workspace_stub(&worktree_path)?;
```
3. **Token limits**: Sub-agents may hit context limits. Handle gracefully with truncation.
## Ralph (Autonomous Loop)
Ralph implements PRD-driven development. Key files:
- `src/ralph/ralph_loop.rs` - Main loop
- `src/ralph/types.rs` - PRD structures
### PRD Structure
```json
{
"project": "project-name",
"feature": "Feature Name",
"quality_checks": {
"typecheck": "cargo check",
"lint": "cargo clippy",
"test": "cargo test",
"build": "cargo build --release"
},
"user_stories": [
{
"id": "US-001",
"title": "Story title",
"description": "What to implement",
"acceptance_criteria": ["Criterion 1", "Criterion 2"],
"priority": 1,
"depends_on": [],
"passes": false
}
]
}
```
### Memory Persistence
Ralph uses file-based memory (not context accumulation):
- `progress.txt` - Agent writes learnings/blockers
- `prd.json` - Tracks pass/fail status
- Git history - Shows what changed per iteration
## Message Roles
When building conversation messages:
```rust
// User message
Message { role: Role::User, content: vec![ContentPart::Text { text }] }
// Assistant message
Message { role: Role::Assistant, content: vec![ContentPart::Text { text }] }
// Tool result (MUST use Role::Tool, not Role::User)
Message { role: Role::Tool, content: vec![ContentPart::ToolResult { tool_call_id, content }] }
```
## Security Modules
### Authentication (`src/server/auth.rs`)
- Mandatory Bearer token middleware — **cannot be disabled**
- Auto-generates HMAC-SHA256 token if `CODETETHER_AUTH_TOKEN` env var not set
- Only `/health` is exempt from auth
- Integrate by adding `AuthLayer` to the axum router
### Audit Trail (`src/audit/mod.rs`)
- Global `AUDIT_LOG` singleton initialized at server startup
- Log events with `log_event(actor, action, resource, outcome, metadata)`
- Query with `query_audit_log(filters)` — filter by actor, action, resource, time range
- Backend: append-only JSON Lines file
### Policy Engine (`src/server/policy.rs` + `mod.rs`)
- **OPA integration**: `check_policy()` sends authz queries to OPA sidecar via HTTP, falls back to `evaluate_local()` with compiled-in `data.json`
- **Structs**: `PolicyUser` (sub, roles, tenant_id, auth_source, scopes), `PolicyResource` (resource_type, resource_id, owner, tenant_id)
- **Middleware**: `policy_middleware()` in `mod.rs` intercepts requests, maps path+method to a permission string via `POLICY_ROUTES`, and calls `check_policy()`
- **`POLICY_ROUTES`**: Static array of `(&str, &str, &str)` tuples — `(path_prefix, http_method, permission)` — covering ~30 Axum endpoints
- **Adding a new endpoint**: Add a `(path, method, "resource:action")` entry to `POLICY_ROUTES` in `mod.rs`
- **Roles**: `admin`, `a2a-admin`, `operator`, `editor`, `viewer` — permissions defined in `policies/data.json`
- **Env vars**: `OPA_URL` (default `http://localhost:8181`), `OPA_ENABLED` (default `true`)
- **Testing**: `cargo test policy` runs 9 unit tests covering role-based access, API key scopes, and tenant isolation
### Plugin Sandbox (`src/tool/sandbox.rs`)
- `ToolManifest`: tool_id, version, sha256_hash, signature, allowed_resources, limits
- `ToolSandbox::execute()`: validates manifest signature (Ed25519), checks SHA-256 integrity, enforces resource policy
- `ManifestStore`: registry for signed tool manifests
- Sandbox policies: `Default`, `Restricted`, `Custom`
### Kubernetes Manager (`src/k8s/mod.rs`)
- `K8sManager::detect_cluster()` checks for `KUBERNETES_SERVICE_HOST`
- `ensure_deployment()` creates or updates Deployments
- `scale(replicas)` adjusts replica count
- `reconcile_loop()` runs every 30s in background
- All ops use the `kube` crate with in-cluster config
## Common Pitfalls
1. **Tool results must use `Role::Tool`** - Using `Role::User` causes API errors with tool call validation
2. **Kimi K2.5 requires `temperature=1.0`** - Other temperatures may cause issues
3. **Worktrees need Cargo workspace isolation** - Use `inject_workspace_stub()` to prepend `[workspace]` to Cargo.toml
4. **TodoStatus enum has aliases** - Accepts both `inprogress` and `in_progress` variants
5. **Session must call `.save()` after modifications** - Persist to disk for session continuity
6. **Auth is mandatory** - The `AuthLayer` in `src/server/auth.rs` cannot be conditionally removed. All endpoints except `/health` require a Bearer token.
7. **Audit log must be initialized before serving** - Call `init_audit_log()` in `serve()` before binding the router. The `AUDIT_LOG` OnceCell can only be set once.
8. **Policy middleware ordering** - Policy middleware runs between audit and auth middleware. Auth verifies the token, then policy middleware checks permissions via OPA. If you add a new route, add a corresponding `POLICY_ROUTES` entry.
9. **`unsafe` in tests** - Rust 2024 edition requires `unsafe {}` around `std::env::remove_var()` in tests (see `auth.rs`).
## PR/Commit Guidelines
- Format: `type: brief description`
- Types: `feat`, `fix`, `refactor`, `docs`, `test`, `chore`
- Keep commits atomic and focused
- Run `cargo fmt` and `cargo clippy` before committing
## Before Finalizing Changes
1. `cargo fmt` - Format code
2. `cargo clippy --all-features` - Check for lints
3. `cargo test` - Run tests
4. `cargo build --release` - Verify release build
## Debugging
Enable detailed logging:
```bash
RUST_LOG=codetether=debug codetether tui
RUST_LOG=codetether::session=trace codetether run "test"
```
## Performance Notes
- Startup target: <15ms
- Memory target: <20MB idle
- Use `Arc` for shared provider state
- Prefer streaming responses for large outputs
## Forage System (`src/forage/mod.rs`)
Forage is an autonomous OKR-driven work selection and execution system that
selects high-value work items from active OKRs and optionally executes them
in isolated git worktrees.
### Purpose
- Scans active/draft/on_hold OKRs for actionable opportunities
- Ranks opportunities by KR progress, moonshot alignment, and scoring heuristics
- Executes work in git worktrees for isolation (with `--execute`)
- Reports progress via the agent bus and S3 event sink
### Key Types
- `ForageOpportunity`: Scored work item with OKR/KR linkage, moonshot alignment
- `ForageRunSummary`: Execution summary with cycles, selections, and failures
- `MoonshotRubric`: Optional goals for alignment scoring
### CLI Usage
```bash
# Scan for opportunities (dry run)
codetether forage --codebases /path/to/project
# Execute top opportunity in a worktree
codetether forage --codebases /path/to/project --execute --top 1
# Continuous loop with moonshot alignment
codetether forage --codebases /path/to/project --loop --moonshot "Build AI tools" --execute
```
### CLI Flags
- `--top N`: Show top-N opportunities each cycle (default: 3)
- `--loop`: Keep running continuously
- `--interval-secs N`: Seconds between loop cycles (default: 120)
- `--max-cycles N`: Maximum cycles when looping (0 = unlimited)
- `--execute`: Execute selected opportunities via `codetether run`
- `--no-s3`: Disable S3/MinIO archival requirement (for local-only)
- `--moonshot MISSION`: Moonshot mission statement(s) for prioritization
### Integration Points
- Uses `OkrRepository` for OKR/KR queries
- Creates worktrees via `worktree::WorktreeManager`
- Executes via `SwarmExecutor` for parallel agent work
- Reports to `AgentBus` for inter-agent communication
- Persists traces to S3 via `BusS3Sink`
!Important we have formatting rules we are trying to implement, SRP Modular cohesion and 50 line file limits
## Hard Code Quality Rules
### **Modular Cohesion & Single Responsibility Principle (SRP)**
- **NEVER** mix concerns in a single file or function
- **EACH** module/file/function must have ONE clear responsibility
- **WHEN** a file handles multiple concerns, immediately refactor into separate modules
- **ALL** controllers must only handle HTTP concerns (request/response parsing)
- **ALL** business logic must be in separate model/service layers
- **ALL** database operations must be in dedicated repository/query modules
### **50-Line File Limit**
- **STRICT** 50-line maximum per file (excluding comments and blank lines)
- **WHEN** a file exceeds 50 lines, **MUST** split into smaller modules
- **IF** you're at 45+ lines, proactively refactor before hitting the limit
- **FILES** should be focused: one struct, one function group, or one concern
### **Type Safety Enforcement**
- **NEVER** use `any` type - if the project maintainer sees `any`, they will assume you are a bad developer and will be forced to fix it without asking
- **ALWAYS** define explicit types for function parameters and return values
- **USE** TypeScript strict mode everywhere
- **PREFER** type inference (`const x = ...`) only when the type is obvious
### **Code Review Expectations**
These are **hard rules**, not suggestions. Violations will be rejected in code review.
!Important Never run `cargo build` or `cargo check`, let the CI catch any build or type errors.