CodeTether Agent
Linux binary (v0.1.5): direct | tar.gz | SHA256SUMS
A high-performance AI coding agent with first-class A2A (Agent-to-Agent) protocol support, written in Rust. Features a rich terminal UI with dedicated views for swarm orchestration and autonomous PRD-driven development. Part of the CodeTether ecosystem.

What's New in v0.1.5
- Perpetual Persona Swarms (Phase 0) — Always-on cognition runtime with persona lineage, SSE event stream, and control APIs.
- Bedrock Provider — Native Amazon Bedrock Converse API support (including region-aware configuration).
- Provider Model Discovery — Added default model catalogs for OpenAI-compatible providers (
cerebras,novita,minimax). - Worker API Alignment — Updated worker registration, task, and heartbeat paths to the
/v1/opencode/*namespace. - Model ID Translation Fix — Preserves model IDs that use
:for version suffixes (for exampleamazon.nova-micro-v1:0).
See full release notes.
Features
- A2A-Native: Built from the ground up for the A2A protocol - works as a worker agent for the CodeTether platform
- AI-Powered Coding: Intelligent code assistance using multiple AI providers (OpenAI, Anthropic, Google, Moonshot, GitHub Copilot, etc.)
- Swarm Execution: Parallel sub-agent execution with real-time per-agent event streaming and dedicated TUI detail view
- Ralph Loop: Autonomous PRD-driven development with dedicated TUI view — give it a spec, watch it work story by story
- Interactive TUI: Rich terminal interface with webview layout, model selector, session picker, swarm view, and Ralph view
- RLM Processing: Handle context larger than model windows via recursive language model approach
- Secure Secrets: All API keys loaded exclusively from HashiCorp Vault - no environment variable secrets
- 27+ Tools: Comprehensive tool system for file ops, LSP, code search, web fetch, and more
- Session Management: Persistent session history with git-aware storage
- High Performance: Written in Rust — 13ms startup, <20MB idle memory, true parallelism via tokio
Installation
One-Click Install (Recommended)
|
No Rust toolchain required. Downloads the latest pre-built binary and installs to /usr/local/bin (or ~/.local/bin).
From crates.io
This installs the codetether binary to ~/.cargo/bin/.
From GitHub Releases
Download pre-built binaries from GitHub Releases.
From Source
# Binary at target/release/codetether
Crash Reporting (Opt-In)
CodeTether can automatically capture catastrophic crashes (panic message, location, stack trace, version, OS/arch, and command) and send them to a remote endpoint on next startup.
- Disabled by default.
- On first interactive TUI run, CodeTether asks for explicit consent.
- No source files or API keys are included.
- Reports are queued locally in the data directory under
crash-reports/before upload. - Uploads use a versioned schema envelope (
codetether.crash.v1) with legacy fallback for older endpoints.
Enable:
Disable:
Set a custom endpoint:
If your crash endpoint requires authentication, set one of these environment variables:
# or
Quick Start
1. Configure HashiCorp Vault
All API keys are stored in HashiCorp Vault for security. Set up your Vault connection:
Store your provider API keys in Vault:
# Moonshot AI (default provider)
# OpenRouter (access to many models)
# Google AI
# Anthropic (or via Azure)
# Azure Anthropic
# StepFun
If You See "No providers available"
This means CodeTether can run, but it cannot find any API keys in Vault.
Use this copy/paste checklist:
# 1) Set Vault connection details (replace with your real values)
# 2) Add one provider key (example: OpenRouter)
# 3) Verify the key exists
# 4) Test CodeTether
If you are logged in as root, do not use sudo in install commands.
For worker/service setups, make sure the same VAULT_* variables are present in your service environment (for example /etc/default/codetether-agent) before restarting.
Supported Providers
| Provider | Default Model | Notes |
|---|---|---|
moonshotai |
kimi-k2.5 |
Default - excellent for coding |
github-copilot |
claude-opus-4 |
GitHub Copilot models (Claude, GPT, Gemini) |
openrouter |
stepfun/step-3.5-flash:free |
Access to many models |
google |
gemini-2.5-pro |
Google AI |
anthropic |
claude-sonnet-4-20250514 |
Direct or via Azure |
stepfun |
step-3.5-flash |
Chinese reasoning model |
2. Connect to CodeTether Platform
# Connect as a worker to the CodeTether A2A server
# Or with authentication
# Or use the one-command deploy script (from repo root)
3. Or Use Interactive Mode
# Start the TUI in current directory
# Start in a specific project
CLI Quick Reference
# Interactive TUI (like opencode)
# Chat mode (no tools)
# Swarm mode - parallel sub-agents for complex tasks
# Ralph - autonomous PRD-driven development
# Generate a PRD template
# Start HTTP server
# Show config
Usage
Default Mode: A2A Worker
By default, codetether runs as an A2A worker that connects to the CodeTether platform:
# Connect to CodeTether platform
# With custom worker name
Environment variables:
CODETETHER_SERVER- A2A server URLCODETETHER_TOKEN- Authentication tokenCODETETHER_WORKER_NAME- Worker name
Interactive TUI

The TUI provides:
- Webview layout: Dashboard with sidebar, chat, and inspector (
/webvieworCtrl+B) - Model selector: Browse and pick models at runtime (
/modelorCtrl+M) - Swarm view:
/swarm <task>with real-time per-agent progress, tool calls, and detail view (Enteron a subtask) - Ralph view:
/ralph [prd.json]with per-story progress, quality gates, and sub-agent activity - Session management:
/sessionspicker,/resume,/new - Real-time tool streaming: See tool calls as they execute
- Theme support: Customizable colors via config with hot-reload
TUI Slash Commands
| Command | Description |
|---|---|
/swarm <task> |
Run task in parallel swarm mode |
/ralph [path] |
Start autonomous PRD loop (default: prd.json) |
/model [name] |
Open model picker or set model directly |
/sessions |
Open session picker to resume a previous session |
/resume [id] |
Resume most recent or specific session |
/new |
Start a fresh session |
/webview |
Switch to dashboard layout |
/classic |
Switch to single-pane layout |
/inspector |
Toggle inspector pane |
/refresh |
Refresh workspace and session cache |
/view |
Toggle swarm view |
TUI Keyboard Shortcuts
| Key | Action |
|---|---|
Ctrl+M |
Open model selector |
Ctrl+B |
Toggle webview/classic layout |
Ctrl+S / F2 |
Toggle swarm view |
F3 |
Toggle inspector pane |
Tab |
Switch between build/plan agents |
Alt+j/k |
Scroll down/up |
Alt+u/d |
Half-page scroll |
Ctrl+R |
Search command history |
? |
Toggle help overlay |
Non-Interactive Mode (Chat - No Tools)
# Run a single prompt (chat only, no file editing tools)
# Continue from last session
# Use a specific model
Note: codetether run is chat-only mode without tools. For coding tasks, use swarm or ralph.
HTTP Server
# Start the API server
Configuration Management
# Show current config
# Initialize default config
Configuration
Configuration is stored in ~/.config/codetether-agent/config.toml:
[]
= "anthropic"
= "claude-sonnet-4-20250514"
[]
= true
= true
[]
= "dark"
[]
= true
Note: API keys are NOT stored in config files. They must be stored in HashiCorp Vault.
HashiCorp Vault Setup
Vault Secret Structure
secret/codetether/providers/
├── openai → { "api_key": "sk-...", "organization": "org-..." }
├── anthropic → { "api_key": "sk-ant-..." }
├── google → { "api_key": "AIza..." }
├── deepseek → { "api_key": "..." }
└── ...
Environment Variables
| Variable | Description |
|---|---|
VAULT_ADDR |
Vault server address (e.g., https://vault.example.com:8200) |
VAULT_TOKEN |
Vault authentication token |
VAULT_MOUNT |
KV secrets engine mount path (default: secret) |
VAULT_SECRETS_PATH |
Path prefix for provider secrets (default: codetether/providers) |
CODETETHER_DEFAULT_MODEL |
Default model to use (e.g., moonshotai/kimi-k2.5) |
CODETETHER_SERVER |
A2A server URL |
CODETETHER_TOKEN |
Authentication token |
CODETETHER_WORKER_NAME |
Worker name |
CODETETHER_COGNITION_ENABLED |
Enable perpetual cognition runtime (true/false, default: true) |
CODETETHER_COGNITION_AUTO_START |
Auto-start cognition loop on serve startup (default: true) |
CODETETHER_COGNITION_LOOP_INTERVAL_MS |
Loop interval in milliseconds (default: 2000) |
CODETETHER_COGNITION_MAX_SPAWN_DEPTH |
Max persona lineage depth (default: 4) |
CODETETHER_COGNITION_MAX_BRANCHING_FACTOR |
Max active children per persona (default: 4) |
CODETETHER_COGNITION_MAX_EVENTS |
In-memory event buffer size (default: 2000) |
CODETETHER_COGNITION_MAX_SNAPSHOTS |
In-memory snapshot buffer size (default: 128) |
CODETETHER_COGNITION_THINKER_ENABLED |
Enable model-backed thought generation (true/false, default: true) |
CODETETHER_COGNITION_THINKER_BACKEND |
Thinker backend: openai_compat or candle (default: openai_compat) |
CODETETHER_COGNITION_THINKER_BASE_URL |
OpenAI-compatible base URL for thinker model (default: http://127.0.0.1:11434/v1) |
CODETETHER_COGNITION_THINKER_MODEL |
Model id for thought generation (default: qwen2.5:3b-instruct) |
CODETETHER_COGNITION_THINKER_API_KEY |
Optional API key for thinker endpoint |
CODETETHER_COGNITION_THINKER_TEMPERATURE |
Thinker temperature (default: 0.2) |
CODETETHER_COGNITION_THINKER_TOP_P |
Optional thinker top-p |
CODETETHER_COGNITION_THINKER_MAX_TOKENS |
Max generated tokens per thought step (default: 256) |
CODETETHER_COGNITION_THINKER_TIMEOUT_MS |
Thinker request timeout in ms (default: 12000) |
CODETETHER_COGNITION_THINKER_CANDLE_MODEL_PATH |
GGUF model path for in-process Candle inference |
CODETETHER_COGNITION_THINKER_CANDLE_TOKENIZER_PATH |
tokenizer.json path used by Candle backend |
CODETETHER_COGNITION_THINKER_CANDLE_ARCH |
Candle model architecture (llama or qwen2, default: auto from GGUF metadata) |
CODETETHER_COGNITION_THINKER_CANDLE_DEVICE |
Candle device selection: auto, cpu, or cuda (default: auto) |
CODETETHER_COGNITION_THINKER_CANDLE_CUDA_ORDINAL |
CUDA device ordinal when using cuda (default: 0) |
CODETETHER_COGNITION_THINKER_CANDLE_REPEAT_PENALTY |
Candle repetition penalty (default: 1.1) |
CODETETHER_COGNITION_THINKER_CANDLE_REPEAT_LAST_N |
Token window for repetition penalty (default: 64) |
CODETETHER_COGNITION_THINKER_CANDLE_SEED |
Base sampling seed for Candle thinker (default: 42) |
GPU execution requires building with --features candle-cuda (or candle-cudnn).
Using Vault Agent
For production, use Vault Agent for automatic token renewal:
# vault-agent.hcl
vault {
address = "https://vault.example.com:8200"
}
auto_auth {
method "kubernetes" {
mount_path = "auth/kubernetes"
config = {
role = "codetether-agent"
}
}
sink "file" {
config = {
path = "/tmp/vault-token"
}
}
}
Agents
Build Agent
Full access to development tools. Can read, write, edit files and execute commands.
Plan Agent
Read-only access for analysis and exploration. Perfect for understanding codebases before making changes.
Explore Agent
Specialized for code navigation and discovery.
Tools
CodeTether Agent includes 27+ tools for comprehensive development automation:
File Operations
| Tool | Description |
|---|---|
read_file |
Read file contents |
write_file |
Write content to files |
list_dir |
List directory contents |
glob |
Find files by pattern |
edit |
Apply search/replace patches |
multiedit |
Batch edits across multiple files |
apply_patch |
Apply unified diff patches |
Code Intelligence
| Tool | Description |
|---|---|
lsp |
Language Server Protocol operations (definition, references, hover, completion) |
grep |
Search file contents with regex |
codesearch |
Semantic code search |
Execution
| Tool | Description |
|---|---|
bash |
Execute shell commands |
batch |
Run multiple tool calls in parallel |
task |
Background task execution |
Web & External
| Tool | Description |
|---|---|
webfetch |
Fetch web pages with smart extraction |
websearch |
Search the web for information |
Agent Orchestration
| Tool | Description |
|---|---|
ralph |
Autonomous PRD-driven agent loop |
rlm |
Recursive Language Model for large contexts |
prd |
Generate and manage PRD documents |
plan_enter/plan_exit |
Switch to planning mode |
question |
Ask clarifying questions |
skill |
Execute learned skills |
todo_read/todo_write |
Track task progress |
A2A Protocol
CodeTether Agent is built for the A2A (Agent-to-Agent) protocol:
- Worker Mode (default): Connect to the CodeTether platform and process tasks
- Server Mode: Accept tasks from other agents (
codetether serve) - Client Mode: Dispatch tasks to other A2A agents
AgentCard
When running as a server, the agent exposes its capabilities via /.well-known/agent.json:
Perpetual Persona Swarms API (Phase 0)
When running codetether serve, the agent also exposes cognition + swarm control APIs:
| Method | Endpoint | Description |
|---|---|---|
POST |
/v1/cognition/start |
Start perpetual cognition loop |
POST |
/v1/cognition/stop |
Stop cognition loop |
GET |
/v1/cognition/status |
Runtime status and buffer metrics |
GET |
/v1/cognition/stream |
SSE stream of thought events |
GET |
/v1/cognition/snapshots/latest |
Latest compressed memory snapshot |
POST |
/v1/swarm/personas |
Create a root persona |
POST |
/v1/swarm/personas/{id}/spawn |
Spawn child persona |
POST |
/v1/swarm/personas/{id}/reap |
Reap a persona (optional cascade) |
GET |
/v1/swarm/lineage |
Current persona lineage graph |
/v1/cognition/start auto-seeds a default root-thinker persona when no personas exist, unless a seed_persona is provided.
See docs/perpetual_persona_swarms.md for request/response contracts.
CUDA Build/Deploy Helpers
From codetether-agent/:
make build-cuda- Build a CUDA-enabled binary locally.make deploy-spike2-cuda- Sync source tospike2, build with--features candle-cuda, install, and restart service.make status-spike2-cuda- Check service status, active Candle device config, and GPU usage onspike2.
Architecture
┌─────────────────────────────────────────────────────────┐
│ CodeTether Platform │
│ (A2A Server at api.codetether.run) │
└────────────────────────┬────────────────────────────────┘
│ SSE/JSON-RPC
▼
┌─────────────────────────────────────────────────────────┐
│ codetether-agent │
│ ┌─────────┐ ┌─────────┐ ┌─────────┐ ┌─────────┐ │
│ │ A2A │ │ Agent │ │ Tool │ │ Provider│ │
│ │ Worker │ │ System │ │ System │ │ Layer │ │
│ └────┬────┘ └────┬────┘ └────┬────┘ └────┬────┘ │
│ │ │ │ │ │
│ └────────────┴────────────┴────────────┘ │
│ │ │
│ ┌──────────────────────┴──────────────────────────┐ │
│ │ HashiCorp Vault │ │
│ │ (API Keys & Secrets) │ │
│ └─────────────────────────────────────────────────┘ │
└─────────────────────────────────────────────────────────┘
Swarm: Parallel Sub-Agent Execution
The swarm command decomposes complex tasks into parallelizable subtasks and executes them concurrently:
# Execute a complex task with parallel sub-agents (uses CODETETHER_DEFAULT_MODEL or defaults to moonshotai/kimi-k2.5)
# Specify a model explicitly
# Control parallelism and strategy
# Generate JSON output
Decomposition Strategies
| Strategy | Description |
|---|---|
auto |
LLM-driven automatic decomposition (default) |
domain |
Split by domain expertise (frontend, backend, etc.) |
data |
Split by data partitions |
stage |
Split by pipeline stages (analyze → implement → test) |
none |
Execute as single task |
RLM: Recursive Language Model Processing
The rlm command handles large contexts that exceed model context windows using the Recursive Language Model approach:
# Analyze a large source file
# Analyze multiple files
# Analyze stdin content
|
# JSON output for programmatic use
How RLM Works
Based on the "Recursive Language Model" paper approach:
- Context Loading: Large content is loaded into a REPL-like environment
- LLM Analysis: The LLM writes code to explore the context (head, tail, grep, etc.)
- Sub-LM Calls: The LLM can call
llm_query()for semantic sub-questions - FINAL Answer: After 1-5 iterations, the LLM returns a synthesized answer
RLM Commands (Internal REPL)
| Command | Description |
|---|---|
head(n) |
First n lines of context |
tail(n) |
Last n lines of context |
grep("pattern") |
Search for regex pattern |
count("pattern") |
Count pattern occurrences |
llm_query("question") |
Ask semantic sub-question |
FINAL("answer") |
Return final answer |
Ralph: Autonomous PRD-Driven Agent Loop
Ralph is an autonomous agent loop that implements features from a structured PRD (Product Requirements Document). Each iteration is a fresh agent instance with clean context, while memory persists via git history, progress.txt, and the PRD itself.
# Create a new PRD template
# Run Ralph to implement the PRD (note: -p or --prd is required for custom PRD path)
# Or using short flags
# Check status
How Ralph Works
- Load PRD: Read user stories with acceptance criteria, priorities, and dependencies
- Select Story: Pick the highest-priority incomplete story with satisfied dependencies
- Implement: The AI agent has full tool access to read, write, edit, and execute
- Quality Check: Run all quality checks (cargo check, clippy, test, build)
- Mark Complete: Update PRD with pass/fail status
- Repeat: Continue until all stories pass or max iterations reached
PRD Structure
Memory Across Iterations
Ralph maintains memory across iterations without context window bloat:
| Memory Source | Purpose |
|---|---|
| Git history | Commits from previous iterations show what changed |
| progress.txt | Agent writes learnings, blockers, and context |
| prd.json | Tracks which stories pass/fail |
| Quality checks | Error output guides next iteration |
Dogfooding: Self-Implementing Agent
This project demonstrates true dogfooding—using the agent to build its own features.
What We Accomplished
Using ralph and swarm, the agent autonomously implemented:
LSP Client Implementation (10 stories):
- US-001: LSP Transport Layer - stdio implementation
- US-002: JSON-RPC Message Framework
- US-003: LSP Initialize Handshake
- US-004: Text Document Synchronization - didOpen
- US-005: Text Document Synchronization - didChange
- US-006: Text Document Completion
- US-007: Text Document Hover
- US-008: Text Document Definition
- US-009: LSP Shutdown and Exit
- US-010: LSP Client Configuration and Server Management
Missing Features (10 stories):
- MF-001: External Directory Tool
- MF-002: RLM Pool - Connection Pooling
- MF-003: Truncation Utilities
- MF-004: LSP Full Integration - Server Management
- MF-005: LSP Transport - stdio Communication
- MF-006: LSP Requests - textDocument/definition
- MF-007: LSP Requests - textDocument/references
- MF-008: LSP Requests - textDocument/hover
- MF-009: LSP Requests - textDocument/completion
- MF-010: RLM Router Enhancement
Results
| Metric | Value |
|---|---|
| Total User Stories | 20 |
| Stories Passed | 20 (100%) |
| Total Iterations | 20 |
| Quality Checks Per Story | 4 (check, clippy, test, build) |
| Lines of Code Generated | ~6,000+ |
| Time to Complete | ~30 minutes |
| Model Used | Kimi K2.5 (Moonshot AI) |
Efficiency Comparison
| Approach | Time | Cost | Notes |
|---|---|---|---|
| Manual Development | 80 hours | $8,000 | Senior dev @ $100/hr, 50-100 LOC/day |
| opencode + subagents | 100 min | ~$11.25 | Bun runtime, Kimi K2.5 (same model) |
| codetether swarm | 29.5 min | $3.75 | Native Rust, Kimi K2.5 |
vs Manual: 163x faster, 2133x cheaper vs opencode: 3.4x faster, ~3x cheaper (same Kimi K2.5 model)
Key advantages over opencode subagents (model parity):
- Native Rust binary (13ms startup vs 25-50ms Bun)
- Direct API calls vs TypeScript HTTP overhead
- PRD-driven state in files vs subagent process spawning
- ~3x fewer tokens due to reduced subagent initialization overhead
Note: Both have LLM-based compaction. The efficiency gain comes from PRD-driven architecture (state in prd.json + progress.txt) vs. spawning subprocesses with rebuilt context.
How to Replicate
# 1. Create a PRD for your feature
# 2. Run Ralph
# 3. Watch as your feature gets implemented autonomously
Why This Matters
- Proof of Capability: The agent can implement non-trivial features end-to-end
- Quality Assurance: Every story passes cargo check, clippy, test, and build
- Autonomous Operation: No human intervention during implementation
- Reproducible Process: PRD-driven development is structured and repeatable
- Self-Improvement: The agent literally improved itself
Content Types
RLM auto-detects content type for optimized processing:
| Type | Detection | Optimization |
|---|---|---|
code |
Function definitions, imports | Semantic chunking by symbols |
logs |
Timestamps, log levels | Time-based chunking |
conversation |
Chat markers, turns | Turn-based chunking |
documents |
Markdown headers, paragraphs | Section-based chunking |
Example Output
{
}
Performance: Why Rust Over Bun/TypeScript
CodeTether Agent is written in Rust for measurable performance advantages over JavaScript/TypeScript runtimes like Bun:
Benchmark Results
| Metric | CodeTether (Rust) | opencode (Bun) | Advantage |
|---|---|---|---|
| Binary Size | 12.5 MB | ~90 MB (bun + deps) | 7.2x smaller |
| Startup Time | 13 ms | 25-50 ms | 2-4x faster |
| Memory (idle) | ~15 MB | ~50-80 MB | 3-5x less |
| Memory (swarm, 10 agents) | ~45 MB | ~200+ MB | 4-5x less |
| Process Spawn | 1.5 ms | 5-10 ms | 3-7x faster |
| Cold Start (container) | ~50 ms | ~200-500 ms | 4-10x faster |
Why This Matters for Sub-Agents
-
Lower Memory Per Agent: With 3-5x less memory per agent, you can run more concurrent sub-agents on the same hardware. A 4GB container can run ~80 Rust sub-agents vs ~15-20 Bun sub-agents.
-
Faster Spawn Time: Sub-agents spawn in 1.5ms vs 5-10ms. For a swarm of 100 agents, that's 150ms vs 500-1000ms just in spawn overhead.
-
No GC Pauses: Rust has no garbage collector. JavaScript/Bun has GC pauses that can add latency spikes of 10-50ms during high-memory operations.
-
True Parallelism: Rust's tokio runtime uses OS threads with work-stealing. Bun uses a single-threaded event loop that can bottleneck on CPU-bound decomposition.
-
Smaller Attack Surface: Smaller binary = fewer dependencies = smaller CVE surface. Critical for agents with shell access.
Resource Efficiency for Swarm Workloads
┌─────────────────────────────────────────────────────────────────┐
│ Memory Usage Comparison │
│ │
│ Sub-Agents CodeTether (Rust) opencode (Bun) │
│ ────────────────────────────────────────────────────────────── │
│ 1 15 MB 60 MB │
│ 5 35 MB 150 MB │
│ 10 55 MB 280 MB │
│ 25 105 MB 650 MB │
│ 50 180 MB 1200 MB │
│ 100 330 MB 2400 MB │
│ │
│ At 100 sub-agents: Rust uses 7.3x less memory │
└─────────────────────────────────────────────────────────────────┘
Real-World Impact
For a typical swarm task (e.g., "Implement feature X with tests"):
| Scenario | CodeTether | opencode (Bun) |
|---|---|---|
| Task decomposition | 50ms | 150ms |
| Spawn 5 sub-agents | 8ms | 35ms |
| Peak memory | 45 MB | 180 MB |
| Total overhead | ~60ms | ~200ms |
Result: 3.3x faster task initialization, 4x less memory, more capacity for actual AI inference.
Measured: Dogfooding Task (20 User Stories)
Actual resource usage from implementing 20 user stories autonomously:
┌─────────────────────────────────────────────────────────────────┐
│ Dogfooding Task: 20 Stories, Same Model (Kimi K2.5) │
│ │
│ Metric CodeTether opencode (estimated) │
│ ────────────────────────────────────────────────────────────── │
│ Total Time 29.5 min 100 min (3.4x slower) │
│ Wall Clock 1,770 sec 6,000 sec │
│ Iterations 20 20 │
│ Spawn Overhead 20 × 1.5ms = 30ms 20 × 7.5ms = 150ms │
│ Startup Overhead 20 × 13ms = 260ms 20 × 37ms = 740ms │
│ Peak Memory ~55 MB ~280 MB │
│ Tokens Used 500K ~1.5M (subagent init) │
│ Token Cost $3.75 ~$11.25 │
│ │
│ Total Overhead 290ms 890ms (3.1x more) │
│ Memory Efficiency 5.1x less peak RAM │
│ Cost Efficiency ~3x cheaper │
└─────────────────────────────────────────────────────────────────┘
Computation Notes:
- Spawn overhead:
iterations × spawn_time(1.5ms Rust vs 7.5ms Bun avg) - Startup overhead:
iterations × startup_time(13ms Rust vs 37ms Bun avg) - Token difference: opencode has compaction, but subagent spawns rebuild system prompt + context each time (~3x more tokens)
- Memory: Based on 10-agent swarm profile (55 MB vs 280 MB)
- Cost: Same Kimi K2.5 pricing, difference is from subagent initialization overhead
Note: opencode uses LLM-based compaction for long sessions (similar to codetether). The token difference comes from subagent process spawning overhead, not lack of context management.
Benchmark Methodology
Run benchmarks yourself:
Benchmarks performed on:
- Ubuntu 24.04, x86_64
- 48 CPU threads, 32GB RAM
- Rust 1.85, Bun 1.x
- HashiCorp Vault for secrets
Development
# Run in development mode
# Run tests
# Build release binary
# Run benchmarks
License
MIT