CodeTether Agent
Linux binary (v0.1.5): direct | tar.gz | SHA256SUMS
A high-performance AI coding agent with first-class A2A (Agent-to-Agent) protocol support, written in Rust. Features a rich terminal UI with dedicated views for swarm orchestration and autonomous PRD-driven development. Part of the CodeTether ecosystem.

What's New in v0.1.5
- Perpetual Persona Swarms (Phase 0) — Always-on cognition runtime with persona lineage, SSE event stream, and control APIs.
- Bedrock Provider — Native Amazon Bedrock Converse API support (including region-aware configuration).
- Provider Model Discovery — Added default model catalogs for OpenAI-compatible providers (
cerebras,novita,minimax). - Worker API Alignment — Updated worker registration, task, and heartbeat paths to the
/v1/opencode/*namespace. - Model ID Translation Fix — Preserves model IDs that use
:for version suffixes (for exampleamazon.nova-micro-v1:0).
See full release notes.
Features
- A2A-Native: Built from the ground up for the A2A protocol - works as a worker agent for the CodeTether platform
- AI-Powered Coding: Intelligent code assistance using multiple AI providers (OpenAI, Anthropic, Google, Moonshot, GitHub Copilot, etc.)
- Swarm Execution: Parallel sub-agent execution with real-time per-agent event streaming and dedicated TUI detail view
- Ralph Loop: Autonomous PRD-driven development with dedicated TUI view — give it a spec, watch it work story by story
- Interactive TUI: Rich terminal interface with webview layout, model selector, session picker, swarm view, and Ralph view
- RLM Processing: Handle context larger than model windows via recursive language model approach
- Secure Secrets: All API keys loaded exclusively from HashiCorp Vault - no environment variable secrets
- FunctionGemma Tool Router: Separates reasoning from tool-call formatting — a tiny local model handles structured output so your primary LLM can focus on thinking (see why this matters)
- 27+ Tools: Comprehensive tool system for file ops, LSP, code search, web fetch, and more
- Session Management: Persistent session history with git-aware storage
- High Performance: Written in Rust — 13ms startup, <20MB idle memory, true parallelism via tokio
Installation
One-Click Install (Recommended)
|
No Rust toolchain required. Downloads the latest pre-built binary and installs to /usr/local/bin (or ~/.local/bin). Also downloads the FunctionGemma model (~292 MB) for local tool-call routing.
# Skip FunctionGemma model download
|
# Download only the FunctionGemma model (existing install)
|
From crates.io
This installs the codetether binary to ~/.cargo/bin/.
From GitHub Releases
Download pre-built binaries from GitHub Releases.
From Source
# Binary at target/release/codetether
# Build without FunctionGemma (smaller binary)
FunctionGemma Tool-Call Router
The Problem
Modern LLMs can call tools. But they're doing two fundamentally different jobs at once: figuring out what to do (reasoning) and formatting how to express it (structured JSON tool calls). These are very different skills, and coupling them has real costs:
- You pay frontier prices for formatting. A $15/M-token model spends tokens producing
{"name": "read_file", "arguments": {"path": "src/main.rs"}}— the same structured output a 270M-parameter model produces perfectly. - Tool-call quality varies wildly. Even models that "support" tool calling often hallucinate tool names, malform arguments, or choose the wrong tool. The reasoning is good, but the formatting is unreliable.
- You're locked to one model's quirks. Switch from Claude to Gemini and tool-call behavior changes. Every provider implements it slightly differently. Your agent has to handle all of them.
- Retries are expensive. When a tool call is malformed, you burn another full cloud round-trip to fix it.
The Solution
CodeTether separates the two jobs. Your primary LLM does what it's best at — reasoning, planning, understanding code. A tiny local model (FunctionGemma, 270M params by Google) runs on your CPU and handles the structured output formatting. It reads what the LLM said it wants to do and produces clean, reliable tool calls.
This is the same principle behind compiler design (parsing vs. code generation), microservices (single responsibility), and even how teams work (the architect decides what to build, the engineer handles how to express it in code).
Why This Is Novel
- No other coding agent separates these concerns. Cursor, Continue, Aider, and opencode all require the primary LLM to handle both reasoning and tool-call formatting in a single pass. That works until it doesn't.
- Provider-agnostic tool calling. Switch models freely — Claude, GPT-4o, Llama, Qwen, Kimi, a self-hosted fine-tune — and tool-call behavior stays consistent because the formatting layer is local and deterministic.
- Cheaper at scale. The reasoning model doesn't waste tokens on JSON syntax. The formatting model runs locally for free. At 1000 tool calls/day, this adds up fast.
- More reliable. A dedicated 270M model trained specifically for function calling is more consistent at structured output than a 400B generalist model doing it as a side task.
- Zero overhead when unnecessary. If your LLM already returns structured tool calls, FunctionGemma is never invoked — pure passthrough, zero latency added.
- Safe degradation. If FunctionGemma fails, the original response is returned unchanged. It never breaks anything.
How It Works
- Your primary LLM (Claude, GPT-4o, Kimi, Llama, etc.) returns a response
- Response already has structured tool calls? → passthrough (zero cost)
- Response is text-only? → FunctionGemma translates it into
<tool_call>blocks locally (~5-50ms on CPU) - The agent processes the structured calls as normal
- Any error? → original response returned unchanged
Setup
The installer downloads the model by default. To enable the router, set these environment variables:
Configuration
| Variable | Default | Description |
|---|---|---|
CODETETHER_TOOL_ROUTER_ENABLED |
false |
true / 1 to activate the router |
CODETETHER_TOOL_ROUTER_MODEL_PATH |
— | Path to the FunctionGemma .gguf model |
CODETETHER_TOOL_ROUTER_TOKENIZER_PATH |
— | Path to tokenizer.json |
CODETETHER_TOOL_ROUTER_ARCH |
gemma3 |
Architecture hint |
CODETETHER_TOOL_ROUTER_DEVICE |
auto |
auto / cpu / cuda |
CODETETHER_TOOL_ROUTER_MAX_TOKENS |
512 |
Max decode tokens |
CODETETHER_TOOL_ROUTER_TEMPERATURE |
0.1 |
Sampling temperature |
Opting Out
- At install time:
--no-functiongemmaflag skips the model download - At build time:
cargo build --release --no-default-featuresexcludes the feature - At runtime: Simply don't set
CODETETHER_TOOL_ROUTER_ENABLED(disabled by default)
Crash Reporting (Opt-In)
CodeTether can automatically capture catastrophic crashes (panic message, location, stack trace, version, OS/arch, and command) and send them to a remote endpoint on next startup.
- Disabled by default.
- On first interactive TUI run, CodeTether asks for explicit consent.
- No source files or API keys are included.
- Reports are queued locally in the data directory under
crash-reports/before upload. - Uploads use a versioned schema envelope (
codetether.crash.v1) with legacy fallback for older endpoints.
Enable:
Disable:
Set a custom endpoint:
If your crash endpoint requires authentication, set one of these environment variables:
# or
Quick Start
1. Configure HashiCorp Vault
All API keys are stored in HashiCorp Vault for security. Set up your Vault connection:
Store your provider API keys in Vault:
# Moonshot AI (default provider)
# OpenRouter (access to many models)
# Google AI
# Anthropic (or via Azure)
# Azure Anthropic
# StepFun
If You See "No providers available"
This means CodeTether can run, but it cannot find any API keys in Vault.
Use this copy/paste checklist:
# 1) Set Vault connection details (replace with your real values)
# 2) Add one provider key (example: OpenRouter)
# 3) Verify the key exists
# 4) Test CodeTether
If you are logged in as root, do not use sudo in install commands.
For worker/service setups, make sure the same VAULT_* variables are present in your service environment (for example /etc/default/codetether-agent) before restarting.
Supported Providers
| Provider | Default Model | Notes |
|---|---|---|
moonshotai |
kimi-k2.5 |
Default - excellent for coding |
github-copilot |
claude-opus-4 |
GitHub Copilot models (Claude, GPT, Gemini) |
openrouter |
stepfun/step-3.5-flash:free |
Access to many models |
google |
gemini-2.5-pro |
Google AI |
anthropic |
claude-sonnet-4-20250514 |
Direct or via Azure |
stepfun |
step-3.5-flash |
Chinese reasoning model |
2. Connect to CodeTether Platform
# Connect as a worker to the CodeTether A2A server
# Or with authentication
# Or use the one-command deploy script (from repo root)
3. Or Use Interactive Mode
# Start the TUI in current directory
# Start in a specific project
CLI Quick Reference
# Interactive TUI (like opencode)
# Chat mode (no tools)
# Swarm mode - parallel sub-agents for complex tasks
# Ralph - autonomous PRD-driven development
# Generate a PRD template
# Start HTTP server
# Show config
Usage
Default Mode: A2A Worker
By default, codetether runs as an A2A worker that connects to the CodeTether platform:
# Connect to CodeTether platform
# With custom worker name
Environment variables:
CODETETHER_SERVER- A2A server URLCODETETHER_TOKEN- Authentication tokenCODETETHER_WORKER_NAME- Worker name
Interactive TUI

The TUI provides:
- Webview layout: Dashboard with sidebar, chat, and inspector (
/webvieworCtrl+B) - Model selector: Browse and pick models at runtime (
/modelorCtrl+M) - Swarm view:
/swarm <task>with real-time per-agent progress, tool calls, and detail view (Enteron a subtask) - Ralph view:
/ralph [prd.json]with per-story progress, quality gates, and sub-agent activity - Session management:
/sessionspicker,/resume,/new - Real-time tool streaming: See tool calls as they execute
- Theme support: Customizable colors via config with hot-reload
TUI Slash Commands
| Command | Description |
|---|---|
/swarm <task> |
Run task in parallel swarm mode |
/ralph [path] |
Start autonomous PRD loop (default: prd.json) |
/model [name] |
Open model picker or set model directly |
/sessions |
Open session picker to resume a previous session |
/resume [id] |
Resume most recent or specific session |
/new |
Start a fresh session |
/webview |
Switch to dashboard layout |
/classic |
Switch to single-pane layout |
/inspector |
Toggle inspector pane |
/refresh |
Refresh workspace and session cache |
/view |
Toggle swarm view |
TUI Keyboard Shortcuts
| Key | Action |
|---|---|
Ctrl+M |
Open model selector |
Ctrl+B |
Toggle webview/classic layout |
Ctrl+S / F2 |
Toggle swarm view |
F3 |
Toggle inspector pane |
Tab |
Switch between build/plan agents |
Alt+j/k |
Scroll down/up |
Alt+u/d |
Half-page scroll |
Ctrl+R |
Search command history |
? |
Toggle help overlay |
Non-Interactive Mode (Chat - No Tools)
# Run a single prompt (chat only, no file editing tools)
# Continue from last session
# Use a specific model
Note: codetether run is chat-only mode without tools. For coding tasks, use swarm or ralph.
HTTP Server
# Start the API server
Configuration Management
# Show current config
# Initialize default config
Configuration
Configuration is stored in ~/.config/codetether-agent/config.toml:
[]
= "anthropic"
= "claude-sonnet-4-20250514"
[]
= true
= true
[]
= "dark"
[]
= true
Note: API keys are NOT stored in config files. They must be stored in HashiCorp Vault.
HashiCorp Vault Setup
Vault Secret Structure
secret/codetether/providers/
├── openai → { "api_key": "sk-...", "organization": "org-..." }
├── anthropic → { "api_key": "sk-ant-..." }
├── google → { "api_key": "AIza..." }
├── deepseek → { "api_key": "..." }
└── ...
Environment Variables
| Variable | Description |
|---|---|
VAULT_ADDR |
Vault server address (e.g., https://vault.example.com:8200) |
VAULT_TOKEN |
Vault authentication token |
VAULT_MOUNT |
KV secrets engine mount path (default: secret) |
VAULT_SECRETS_PATH |
Path prefix for provider secrets (default: codetether/providers) |
CODETETHER_DEFAULT_MODEL |
Default model to use (e.g., moonshotai/kimi-k2.5) |
CODETETHER_SERVER |
A2A server URL |
CODETETHER_TOKEN |
Authentication token |
CODETETHER_WORKER_NAME |
Worker name |
CODETETHER_COGNITION_ENABLED |
Enable perpetual cognition runtime (true/false, default: true) |
CODETETHER_COGNITION_AUTO_START |
Auto-start cognition loop on serve startup (default: true) |
CODETETHER_COGNITION_LOOP_INTERVAL_MS |
Loop interval in milliseconds (default: 2000) |
CODETETHER_COGNITION_MAX_SPAWN_DEPTH |
Max persona lineage depth (default: 4) |
CODETETHER_COGNITION_MAX_BRANCHING_FACTOR |
Max active children per persona (default: 4) |
CODETETHER_COGNITION_MAX_EVENTS |
In-memory event buffer size (default: 2000) |
CODETETHER_COGNITION_MAX_SNAPSHOTS |
In-memory snapshot buffer size (default: 128) |
CODETETHER_COGNITION_THINKER_ENABLED |
Enable model-backed thought generation (true/false, default: true) |
CODETETHER_COGNITION_THINKER_BACKEND |
Thinker backend: openai_compat or candle (default: openai_compat) |
CODETETHER_COGNITION_THINKER_BASE_URL |
OpenAI-compatible base URL for thinker model (default: http://127.0.0.1:11434/v1) |
CODETETHER_COGNITION_THINKER_MODEL |
Model id for thought generation (default: qwen2.5:3b-instruct) |
CODETETHER_COGNITION_THINKER_API_KEY |
Optional API key for thinker endpoint |
CODETETHER_COGNITION_THINKER_TEMPERATURE |
Thinker temperature (default: 0.2) |
CODETETHER_COGNITION_THINKER_TOP_P |
Optional thinker top-p |
CODETETHER_COGNITION_THINKER_MAX_TOKENS |
Max generated tokens per thought step (default: 256) |
CODETETHER_COGNITION_THINKER_TIMEOUT_MS |
Thinker request timeout in ms (default: 12000) |
CODETETHER_COGNITION_THINKER_CANDLE_MODEL_PATH |
GGUF model path for in-process Candle inference |
CODETETHER_COGNITION_THINKER_CANDLE_TOKENIZER_PATH |
tokenizer.json path used by Candle backend |
CODETETHER_COGNITION_THINKER_CANDLE_ARCH |
Candle model architecture (llama or qwen2, default: auto from GGUF metadata) |
CODETETHER_COGNITION_THINKER_CANDLE_DEVICE |
Candle device selection: auto, cpu, or cuda (default: auto) |
CODETETHER_COGNITION_THINKER_CANDLE_CUDA_ORDINAL |
CUDA device ordinal when using cuda (default: 0) |
CODETETHER_COGNITION_THINKER_CANDLE_REPEAT_PENALTY |
Candle repetition penalty (default: 1.1) |
CODETETHER_COGNITION_THINKER_CANDLE_REPEAT_LAST_N |
Token window for repetition penalty (default: 64) |
CODETETHER_COGNITION_THINKER_CANDLE_SEED |
Base sampling seed for Candle thinker (default: 42) |
GPU execution requires building with --features candle-cuda (or candle-cudnn).
Using Vault Agent
For production, use Vault Agent for automatic token renewal:
# vault-agent.hcl
vault {
address = "https://vault.example.com:8200"
}
auto_auth {
method "kubernetes" {
mount_path = "auth/kubernetes"
config = {
role = "codetether-agent"
}
}
sink "file" {
config = {
path = "/tmp/vault-token"
}
}
}
Agents
Build Agent
Full access to development tools. Can read, write, edit files and execute commands.
Plan Agent
Read-only access for analysis and exploration. Perfect for understanding codebases before making changes.
Explore Agent
Specialized for code navigation and discovery.
Tools
CodeTether Agent includes 27+ tools for comprehensive development automation:
File Operations
| Tool | Description |
|---|---|
read_file |
Read file contents |
write_file |
Write content to files |
list_dir |
List directory contents |
glob |
Find files by pattern |
edit |
Apply search/replace patches |
multiedit |
Batch edits across multiple files |
apply_patch |
Apply unified diff patches |
Code Intelligence
| Tool | Description |
|---|---|
lsp |
Language Server Protocol operations (definition, references, hover, completion) |
grep |
Search file contents with regex |
codesearch |
Semantic code search |
Execution
| Tool | Description |
|---|---|
bash |
Execute shell commands |
batch |
Run multiple tool calls in parallel |
task |
Background task execution |
Web & External
| Tool | Description |
|---|---|
webfetch |
Fetch web pages with smart extraction |
websearch |
Search the web for information |
Agent Orchestration
| Tool | Description |
|---|---|
ralph |
Autonomous PRD-driven agent loop |
rlm |
Recursive Language Model for large contexts |
prd |
Generate and manage PRD documents |
plan_enter/plan_exit |
Switch to planning mode |
question |
Ask clarifying questions |
skill |
Execute learned skills |
todo_read/todo_write |
Track task progress |
A2A Protocol
CodeTether Agent is built for the A2A (Agent-to-Agent) protocol:
- Worker Mode (default): Connect to the CodeTether platform and process tasks
- Server Mode: Accept tasks from other agents (
codetether serve) - Client Mode: Dispatch tasks to other A2A agents
AgentCard
When running as a server, the agent exposes its capabilities via /.well-known/agent.json:
Perpetual Persona Swarms API (Phase 0)
When running codetether serve, the agent also exposes cognition + swarm control APIs:
| Method | Endpoint | Description |
|---|---|---|
POST |
/v1/cognition/start |
Start perpetual cognition loop |
POST |
/v1/cognition/stop |
Stop cognition loop |
GET |
/v1/cognition/status |
Runtime status and buffer metrics |
GET |
/v1/cognition/stream |
SSE stream of thought events |
GET |
/v1/cognition/snapshots/latest |
Latest compressed memory snapshot |
POST |
/v1/swarm/personas |
Create a root persona |
POST |
/v1/swarm/personas/{id}/spawn |
Spawn child persona |
POST |
/v1/swarm/personas/{id}/reap |
Reap a persona (optional cascade) |
GET |
/v1/swarm/lineage |
Current persona lineage graph |
/v1/cognition/start auto-seeds a default root-thinker persona when no personas exist, unless a seed_persona is provided.
See docs/perpetual_persona_swarms.md for request/response contracts.
CUDA Build/Deploy Helpers
From codetether-agent/:
make build-cuda- Build a CUDA-enabled binary locally.make deploy-spike2-cuda- Sync source tospike2, build with--features candle-cuda, install, and restart service.make status-spike2-cuda- Check service status, active Candle device config, and GPU usage onspike2.
Architecture
┌─────────────────────────────────────────────────────────┐
│ CodeTether Platform │
│ (A2A Server at api.codetether.run) │
└────────────────────────┬────────────────────────────────┘
│ SSE/JSON-RPC
▼
┌─────────────────────────────────────────────────────────┐
│ codetether-agent │
│ ┌─────────┐ ┌─────────┐ ┌─────────┐ ┌─────────┐ │
│ │ A2A │ │ Agent │ │ Tool │ │ Provider│ │
│ │ Worker │ │ System │ │ System │ │ Layer │ │
│ └────┬────┘ └────┬────┘ └────┬────┘ └────┬────┘ │
│ │ │ │ │ │
│ └────────────┴────────────┴────────────┘ │
│ │ │
│ ┌──────────────────────┴──────────────────────────┐ │
│ │ HashiCorp Vault │ │
│ │ (API Keys & Secrets) │ │
│ └─────────────────────────────────────────────────┘ │
└─────────────────────────────────────────────────────────┘
Swarm: Parallel Sub-Agent Execution
The swarm command decomposes complex tasks into parallelizable subtasks and executes them concurrently:
# Execute a complex task with parallel sub-agents (uses CODETETHER_DEFAULT_MODEL or defaults to moonshotai/kimi-k2.5)
# Specify a model explicitly
# Control parallelism and strategy
# Generate JSON output
Decomposition Strategies
| Strategy | Description |
|---|---|
auto |
LLM-driven automatic decomposition (default) |
domain |
Split by domain expertise (frontend, backend, etc.) |
data |
Split by data partitions |
stage |
Split by pipeline stages (analyze → implement → test) |
none |
Execute as single task |
RLM: Recursive Language Model Processing
The rlm command handles large contexts that exceed model context windows using the Recursive Language Model approach:
# Analyze a large source file
# Analyze multiple files
# Analyze stdin content
|
# JSON output for programmatic use
How RLM Works
Based on the "Recursive Language Model" paper approach:
- Context Loading: Large content is loaded into a REPL-like environment
- LLM Analysis: The LLM writes code to explore the context (head, tail, grep, etc.)
- Sub-LM Calls: The LLM can call
llm_query()for semantic sub-questions - FINAL Answer: After 1-5 iterations, the LLM returns a synthesized answer
RLM Commands (Internal REPL)
| Command | Description |
|---|---|
head(n) |
First n lines of context |
tail(n) |
Last n lines of context |
grep("pattern") |
Search for regex pattern |
count("pattern") |
Count pattern occurrences |
llm_query("question") |
Ask semantic sub-question |
FINAL("answer") |
Return final answer |
Ralph: Autonomous PRD-Driven Agent Loop
Ralph is an autonomous agent loop that implements features from a structured PRD (Product Requirements Document). Each iteration is a fresh agent instance with clean context, while memory persists via git history, progress.txt, and the PRD itself.
# Create a new PRD template
# Run Ralph to implement the PRD (note: -p or --prd is required for custom PRD path)
# Or using short flags
# Check status
How Ralph Works
- Load PRD: Read user stories with acceptance criteria, priorities, and dependencies
- Select Story: Pick the highest-priority incomplete story with satisfied dependencies
- Implement: The AI agent has full tool access to read, write, edit, and execute
- Quality Check: Run all quality checks (cargo check, clippy, test, build)
- Mark Complete: Update PRD with pass/fail status
- Repeat: Continue until all stories pass or max iterations reached
PRD Structure
Memory Across Iterations
Ralph maintains memory across iterations without context window bloat:
| Memory Source | Purpose |
|---|---|
| Git history | Commits from previous iterations show what changed |
| progress.txt | Agent writes learnings, blockers, and context |
| prd.json | Tracks which stories pass/fail |
| Quality checks | Error output guides next iteration |
Dogfooding: Self-Implementing Agent
This project demonstrates true dogfooding—using the agent to build its own features.
What We Accomplished
Using ralph and swarm, the agent autonomously implemented:
LSP Client Implementation (10 stories):
- US-001: LSP Transport Layer - stdio implementation
- US-002: JSON-RPC Message Framework
- US-003: LSP Initialize Handshake
- US-004: Text Document Synchronization - didOpen
- US-005: Text Document Synchronization - didChange
- US-006: Text Document Completion
- US-007: Text Document Hover
- US-008: Text Document Definition
- US-009: LSP Shutdown and Exit
- US-010: LSP Client Configuration and Server Management
Missing Features (10 stories):
- MF-001: External Directory Tool
- MF-002: RLM Pool - Connection Pooling
- MF-003: Truncation Utilities
- MF-004: LSP Full Integration - Server Management
- MF-005: LSP Transport - stdio Communication
- MF-006: LSP Requests - textDocument/definition
- MF-007: LSP Requests - textDocument/references
- MF-008: LSP Requests - textDocument/hover
- MF-009: LSP Requests - textDocument/completion
- MF-010: RLM Router Enhancement
Results
| Metric | Value |
|---|---|
| Total User Stories | 20 |
| Stories Passed | 20 (100%) |
| Total Iterations | 20 |
| Quality Checks Per Story | 4 (check, clippy, test, build) |
| Lines of Code Generated | ~6,000+ |
| Time to Complete | ~30 minutes |
| Model Used | Kimi K2.5 (Moonshot AI) |
Efficiency Comparison
| Approach | Time | Cost | Notes |
|---|---|---|---|
| Manual Development | 80 hours | $8,000 | Senior dev @ $100/hr, 50-100 LOC/day |
| opencode + subagents | 100 min | ~$11.25 | Bun runtime, Kimi K2.5 (same model) |
| codetether swarm | 29.5 min | $3.75 | Native Rust, Kimi K2.5 |
vs Manual: 163x faster, 2133x cheaper vs opencode: 3.4x faster, ~3x cheaper (same Kimi K2.5 model)
Key advantages over opencode subagents (model parity):
- Native Rust binary (13ms startup vs 25-50ms Bun)
- Direct API calls vs TypeScript HTTP overhead
- PRD-driven state in files vs subagent process spawning
- ~3x fewer tokens due to reduced subagent initialization overhead
Note: Both have LLM-based compaction. The efficiency gain comes from PRD-driven architecture (state in prd.json + progress.txt) vs. spawning subprocesses with rebuilt context.
How to Replicate
# 1. Create a PRD for your feature
# 2. Run Ralph
# 3. Watch as your feature gets implemented autonomously
Why This Matters
- Proof of Capability: The agent can implement non-trivial features end-to-end
- Quality Assurance: Every story passes cargo check, clippy, test, and build
- Autonomous Operation: No human intervention during implementation
- Reproducible Process: PRD-driven development is structured and repeatable
- Self-Improvement: The agent literally improved itself
Content Types
RLM auto-detects content type for optimized processing:
| Type | Detection | Optimization |
|---|---|---|
code |
Function definitions, imports | Semantic chunking by symbols |
logs |
Timestamps, log levels | Time-based chunking |
conversation |
Chat markers, turns | Turn-based chunking |
documents |
Markdown headers, paragraphs | Section-based chunking |
Example Output
{
}
Performance: Why Rust Over Bun/TypeScript
CodeTether Agent is written in Rust for measurable performance advantages over JavaScript/TypeScript runtimes like Bun:
Benchmark Results
| Metric | CodeTether (Rust) | opencode (Bun) | Advantage |
|---|---|---|---|
| Binary Size | 12.5 MB | ~90 MB (bun + deps) | 7.2x smaller |
| Startup Time | 13 ms | 25-50 ms | 2-4x faster |
| Memory (idle) | ~15 MB | ~50-80 MB | 3-5x less |
| Memory (swarm, 10 agents) | ~45 MB | ~200+ MB | 4-5x less |
| Process Spawn | 1.5 ms | 5-10 ms | 3-7x faster |
| Cold Start (container) | ~50 ms | ~200-500 ms | 4-10x faster |
Why This Matters for Sub-Agents
-
Lower Memory Per Agent: With 3-5x less memory per agent, you can run more concurrent sub-agents on the same hardware. A 4GB container can run ~80 Rust sub-agents vs ~15-20 Bun sub-agents.
-
Faster Spawn Time: Sub-agents spawn in 1.5ms vs 5-10ms. For a swarm of 100 agents, that's 150ms vs 500-1000ms just in spawn overhead.
-
No GC Pauses: Rust has no garbage collector. JavaScript/Bun has GC pauses that can add latency spikes of 10-50ms during high-memory operations.
-
True Parallelism: Rust's tokio runtime uses OS threads with work-stealing. Bun uses a single-threaded event loop that can bottleneck on CPU-bound decomposition.
-
Smaller Attack Surface: Smaller binary = fewer dependencies = smaller CVE surface. Critical for agents with shell access.
Resource Efficiency for Swarm Workloads
┌─────────────────────────────────────────────────────────────────┐
│ Memory Usage Comparison │
│ │
│ Sub-Agents CodeTether (Rust) opencode (Bun) │
│ ────────────────────────────────────────────────────────────── │
│ 1 15 MB 60 MB │
│ 5 35 MB 150 MB │
│ 10 55 MB 280 MB │
│ 25 105 MB 650 MB │
│ 50 180 MB 1200 MB │
│ 100 330 MB 2400 MB │
│ │
│ At 100 sub-agents: Rust uses 7.3x less memory │
└─────────────────────────────────────────────────────────────────┘
Real-World Impact
For a typical swarm task (e.g., "Implement feature X with tests"):
| Scenario | CodeTether | opencode (Bun) |
|---|---|---|
| Task decomposition | 50ms | 150ms |
| Spawn 5 sub-agents | 8ms | 35ms |
| Peak memory | 45 MB | 180 MB |
| Total overhead | ~60ms | ~200ms |
Result: 3.3x faster task initialization, 4x less memory, more capacity for actual AI inference.
Measured: Dogfooding Task (20 User Stories)
Actual resource usage from implementing 20 user stories autonomously:
┌─────────────────────────────────────────────────────────────────┐
│ Dogfooding Task: 20 Stories, Same Model (Kimi K2.5) │
│ │
│ Metric CodeTether opencode (estimated) │
│ ────────────────────────────────────────────────────────────── │
│ Total Time 29.5 min 100 min (3.4x slower) │
│ Wall Clock 1,770 sec 6,000 sec │
│ Iterations 20 20 │
│ Spawn Overhead 20 × 1.5ms = 30ms 20 × 7.5ms = 150ms │
│ Startup Overhead 20 × 13ms = 260ms 20 × 37ms = 740ms │
│ Peak Memory ~55 MB ~280 MB │
│ Tokens Used 500K ~1.5M (subagent init) │
│ Token Cost $3.75 ~$11.25 │
│ │
│ Total Overhead 290ms 890ms (3.1x more) │
│ Memory Efficiency 5.1x less peak RAM │
│ Cost Efficiency ~3x cheaper │
└─────────────────────────────────────────────────────────────────┘
Computation Notes:
- Spawn overhead:
iterations × spawn_time(1.5ms Rust vs 7.5ms Bun avg) - Startup overhead:
iterations × startup_time(13ms Rust vs 37ms Bun avg) - Token difference: opencode has compaction, but subagent spawns rebuild system prompt + context each time (~3x more tokens)
- Memory: Based on 10-agent swarm profile (55 MB vs 280 MB)
- Cost: Same Kimi K2.5 pricing, difference is from subagent initialization overhead
Note: opencode uses LLM-based compaction for long sessions (similar to codetether). The token difference comes from subagent process spawning overhead, not lack of context management.
Benchmark Methodology
Run benchmarks yourself:
Benchmarks performed on:
- Ubuntu 24.04, x86_64
- 48 CPU threads, 32GB RAM
- Rust 1.85, Bun 1.x
- HashiCorp Vault for secrets
Development
# Run in development mode
# Run tests
# Build release binary
# Run benchmarks
License
MIT