CodeTether Agent

Linux binary (v0.1.5): direct | tar.gz | SHA256SUMS

A high-performance AI coding agent with first-class A2A (Agent-to-Agent) protocol support, written in Rust. Features a rich terminal UI with dedicated views for swarm orchestration and autonomous PRD-driven development. Part of the CodeTether ecosystem.

CodeTether TUI

What's New in v0.1.5

Perpetual Persona Swarms (Phase 0) — Always-on cognition runtime with persona lineage, SSE event stream, and control APIs.
Bedrock Provider — Native Amazon Bedrock Converse API support (including region-aware configuration).
Provider Model Discovery — Added default model catalogs for OpenAI-compatible providers (cerebras, novita, minimax).
Worker API Alignment — Updated worker registration, task, and heartbeat paths to the /v1/opencode/* namespace.
Model ID Translation Fix — Preserves model IDs that use : for version suffixes (for example amazon.nova-micro-v1:0).

See full release notes.

Features

A2A-Native: Built from the ground up for the A2A protocol - works as a worker agent for the CodeTether platform
AI-Powered Coding: Intelligent code assistance using multiple AI providers (OpenAI, Anthropic, Google, Moonshot, GitHub Copilot, etc.)
Swarm Execution: Parallel sub-agent execution with real-time per-agent event streaming and dedicated TUI detail view
Ralph Loop: Autonomous PRD-driven development with dedicated TUI view — give it a spec, watch it work story by story
Interactive TUI: Rich terminal interface with webview layout, model selector, session picker, swarm view, and Ralph view
RLM Processing: Handle context larger than model windows via recursive language model approach
Secure Secrets: All API keys loaded exclusively from HashiCorp Vault - no environment variable secrets
FunctionGemma Tool Router: Separates reasoning from tool-call formatting — a tiny local model handles structured output so your primary LLM can focus on thinking (see why this matters)
27+ Tools: Comprehensive tool system for file ops, LSP, code search, web fetch, and more
Session Management: Persistent session history with git-aware storage
High Performance: Written in Rust — 13ms startup, <20MB idle memory, true parallelism via tokio

Installation

One-Click Install (Recommended)

curl -fsSL https://raw.githubusercontent.com/rileyseaburg/codetether-agent/main/install.sh | sh

No Rust toolchain required. Downloads the latest pre-built binary and installs to /usr/local/bin (or ~/.local/bin). Also downloads the FunctionGemma model (~292 MB) for local tool-call routing.

# Skip FunctionGemma model download
curl -fsSL https://raw.githubusercontent.com/rileyseaburg/codetether-agent/main/install.sh | sh -s -- --no-functiongemma

# Download only the FunctionGemma model (existing install)
curl -fsSL https://raw.githubusercontent.com/rileyseaburg/codetether-agent/main/install.sh | sh -s -- --functiongemma-only

From crates.io

cargo install codetether-agent

This installs the codetether binary to ~/.cargo/bin/.

From GitHub Releases

Download pre-built binaries from GitHub Releases.

From Source

git clone https://github.com/rileyseaburg/codetether-agent
cd codetether-agent
cargo build --release
# Binary at target/release/codetether

# Build without FunctionGemma (smaller binary)
cargo build --release --no-default-features

FunctionGemma Tool-Call Router

The Problem

Modern LLMs can call tools. But they're doing two fundamentally different jobs at once: figuring out what to do (reasoning) and formatting how to express it (structured JSON tool calls). These are very different skills, and coupling them has real costs:

You pay frontier prices for formatting. A $15/M-token model spends tokens producing {"name": "read_file", "arguments": {"path": "src/main.rs"}} — the same structured output a 270M-parameter model produces perfectly.
Tool-call quality varies wildly. Even models that "support" tool calling often hallucinate tool names, malform arguments, or choose the wrong tool. The reasoning is good, but the formatting is unreliable.
You're locked to one model's quirks. Switch from Claude to Gemini and tool-call behavior changes. Every provider implements it slightly differently. Your agent has to handle all of them.
Retries are expensive. When a tool call is malformed, you burn another full cloud round-trip to fix it.

The Solution

CodeTether separates the two jobs. Your primary LLM does what it's best at — reasoning, planning, understanding code. A tiny local model (FunctionGemma, 270M params by Google) runs on your CPU and handles the structured output formatting. It reads what the LLM said it wants to do and produces clean, reliable tool calls.

This is the same principle behind compiler design (parsing vs. code generation), microservices (single responsibility), and even how teams work (the architect decides what to build, the engineer handles how to express it in code).

Why This Is Novel

No other coding agent separates these concerns. Cursor, Continue, Aider, and opencode all require the primary LLM to handle both reasoning and tool-call formatting in a single pass. That works until it doesn't.
Provider-agnostic tool calling. Switch models freely — Claude, GPT-4o, Llama, Qwen, Kimi, a self-hosted fine-tune — and tool-call behavior stays consistent because the formatting layer is local and deterministic.
Cheaper at scale. The reasoning model doesn't waste tokens on JSON syntax. The formatting model runs locally for free. At 1000 tool calls/day, this adds up fast.
More reliable. A dedicated 270M model trained specifically for function calling is more consistent at structured output than a 400B generalist model doing it as a side task.
Zero overhead when unnecessary. If your LLM already returns structured tool calls, FunctionGemma is never invoked — pure passthrough, zero latency added.
Safe degradation. If FunctionGemma fails, the original response is returned unchanged. It never breaks anything.

How It Works

Your primary LLM (Claude, GPT-4o, Kimi, Llama, etc.) returns a response
Response already has structured tool calls? → passthrough (zero cost)
Response is text-only? → FunctionGemma translates it into <tool_call> blocks locally (~5-50ms on CPU)
The agent processes the structured calls as normal
Any error? → original response returned unchanged

Setup

The installer downloads the model by default. To enable the router, set these environment variables:

export CODETETHER_TOOL_ROUTER_ENABLED=true
export CODETETHER_TOOL_ROUTER_MODEL_PATH="$HOME/.local/share/codetether/models/functiongemma/functiongemma-270m-it-Q8_0.gguf"
export CODETETHER_TOOL_ROUTER_TOKENIZER_PATH="$HOME/.local/share/codetether/models/functiongemma/tokenizer.json"

Configuration

Variable	Default	Description
`CODETETHER_TOOL_ROUTER_ENABLED`	`false`	`true` / `1` to activate the router
`CODETETHER_TOOL_ROUTER_MODEL_PATH`	—	Path to the FunctionGemma `.gguf` model
`CODETETHER_TOOL_ROUTER_TOKENIZER_PATH`	—	Path to `tokenizer.json`
`CODETETHER_TOOL_ROUTER_ARCH`	`gemma3`	Architecture hint
`CODETETHER_TOOL_ROUTER_DEVICE`	`auto`	`auto` / `cpu` / `cuda`
`CODETETHER_TOOL_ROUTER_MAX_TOKENS`	`512`	Max decode tokens
`CODETETHER_TOOL_ROUTER_TEMPERATURE`	`0.1`	Sampling temperature

Opting Out

At install time: --no-functiongemma flag skips the model download
At build time: cargo build --release --no-default-features excludes the feature
At runtime: Simply don't set CODETETHER_TOOL_ROUTER_ENABLED (disabled by default)

Crash Reporting (Opt-In)

CodeTether can automatically capture catastrophic crashes (panic message, location, stack trace, version, OS/arch, and command) and send them to a remote endpoint on next startup.

Disabled by default.
On first interactive TUI run, CodeTether asks for explicit consent.
No source files or API keys are included.
Reports are queued locally in the data directory under crash-reports/ before upload.
Uploads use a versioned schema envelope (codetether.crash.v1) with legacy fallback for older endpoints.

Enable:

codetether config --set telemetry.crash_reporting=true

Disable:

codetether config --set telemetry.crash_reporting=false

Set a custom endpoint:

codetether config --set telemetry.crash_report_endpoint=https://your-endpoint.example.com/crashes

If your crash endpoint requires authentication, set one of these environment variables:

export CODETETHER_CRASH_REPORT_AUTH_TOKEN="your-bearer-token"
# or
export CODETETHER_CRASH_REPORT_API_KEY="your-api-key"

Quick Start

1. Configure HashiCorp Vault

All API keys are stored in HashiCorp Vault for security. Set up your Vault connection:

export VAULT_ADDR="https://vault.example.com:8200"
export VAULT_TOKEN="hvs.your-token"

Store your provider API keys in Vault:

# Moonshot AI (default provider)
vault kv put secret/codetether/providers/moonshotai api_key="sk-..."

# OpenRouter (access to many models)
vault kv put secret/codetether/providers/openrouter api_key="sk-or-v1-..."

# Google AI
vault kv put secret/codetether/providers/google api_key="AIza..."

# Anthropic (or via Azure)
vault kv put secret/codetether/providers/anthropic api_key="sk-ant-..." base_url="https://api.anthropic.com"

# Azure Anthropic
vault kv put secret/codetether/providers/anthropic api_key="..." base_url="https://your-endpoint.azure.com/anthropic/v1"

# StepFun
vault kv put secret/codetether/providers/stepfun api_key="..."

vault kv put secret/codetether/providers/zhipuai api_key="..." base_url="https://api.z.ai/api/paas/v4"

If You See "No providers available"

This means CodeTether can run, but it cannot find any API keys in Vault.

Use this copy/paste checklist:

# 1) Set Vault connection details (replace with your real values)
export VAULT_ADDR="https://vault.example.com:8200"
export VAULT_TOKEN="hvs.your-token"
export VAULT_MOUNT="secret"
export VAULT_SECRETS_PATH="codetether/providers"

# 2) Add one provider key (example: OpenRouter)
vault kv put secret/codetether/providers/openrouter api_key="sk-or-v1-..."

# 3) Verify the key exists
vault kv list secret/codetether/providers
vault kv get secret/codetether/providers/openrouter

# 4) Test CodeTether
codetether run --model openrouter/stepfun/step-3.5-flash:free "hello"

If you are logged in as root, do not use sudo in install commands.

For worker/service setups, make sure the same VAULT_* variables are present in your service environment (for example /etc/default/codetether-agent) before restarting.

Supported Providers

Provider	Default Model	Notes
`moonshotai`	`kimi-k2.5`	Default - excellent for coding
`github-copilot`	`claude-opus-4`	GitHub Copilot models (Claude, GPT, Gemini)
`openrouter`	`stepfun/step-3.5-flash:free`	Access to many models
`google`	`gemini-2.5-pro`	Google AI
`anthropic`	`claude-sonnet-4-20250514`	Direct or via Azure
`stepfun`	`step-3.5-flash`	Chinese reasoning model

2. Connect to CodeTether Platform

# Connect as a worker to the CodeTether A2A server
codetether worker --server https://api.codetether.run --codebases /path/to/project

# Or with authentication
codetether worker --server https://api.codetether.run --codebases /path/to/project --token your-worker-token

# Or use the one-command deploy script (from repo root)
./deploy-worker.sh --codebases /path/to/project

3. Or Use Interactive Mode

# Start the TUI in current directory
codetether tui

# Start in a specific project
codetether tui /path/to/project

CLI Quick Reference

# Interactive TUI (like opencode)
codetether tui

# Chat mode (no tools)
codetether run "explain this code"

# Swarm mode - parallel sub-agents for complex tasks
codetether swarm "implement feature X with tests"

# Ralph - autonomous PRD-driven development
codetether ralph run --prd prd.json

# Generate a PRD template
codetether ralph create-prd --feature "My Feature" --project-name "my-app"

# Start HTTP server
codetether serve --port 4096

# Show config
codetether config --show

Usage

Default Mode: A2A Worker

By default, codetether runs as an A2A worker that connects to the CodeTether platform:

# Connect to CodeTether platform
codetether --server https://api.codetether.run

# With custom worker name
codetether --server https://api.codetether.run --name "my-dev-machine"

Environment variables:

CODETETHER_SERVER - A2A server URL
CODETETHER_TOKEN - Authentication token
CODETETHER_WORKER_NAME - Worker name

Interactive TUI

codetether tui

CodeTether TUI

The TUI provides:

Webview layout: Dashboard with sidebar, chat, and inspector (/webview or Ctrl+B)
Model selector: Browse and pick models at runtime (/model or Ctrl+M)
Swarm view: /swarm <task> with real-time per-agent progress, tool calls, and detail view (Enter on a subtask)
Ralph view: /ralph [prd.json] with per-story progress, quality gates, and sub-agent activity
Session management: /sessions picker, /resume, /new
Real-time tool streaming: See tool calls as they execute
Theme support: Customizable colors via config with hot-reload

TUI Slash Commands

Command	Description
`/swarm <task>`	Run task in parallel swarm mode
`/ralph [path]`	Start autonomous PRD loop (default: `prd.json`)
`/model [name]`	Open model picker or set model directly
`/sessions`	Open session picker to resume a previous session
`/resume [id]`	Resume most recent or specific session
`/new`	Start a fresh session
`/webview`	Switch to dashboard layout
`/classic`	Switch to single-pane layout
`/inspector`	Toggle inspector pane
`/refresh`	Refresh workspace and session cache
`/view`	Toggle swarm view

TUI Keyboard Shortcuts

Key	Action
`Ctrl+M`	Open model selector
`Ctrl+B`	Toggle webview/classic layout
`Ctrl+S` / `F2`	Toggle swarm view
`F3`	Toggle inspector pane
`Tab`	Switch between build/plan agents
`Alt+j/k`	Scroll down/up
`Alt+u/d`	Half-page scroll
`Ctrl+R`	Search command history
`?`	Toggle help overlay

Non-Interactive Mode (Chat - No Tools)

# Run a single prompt (chat only, no file editing tools)
codetether run "explain how this codebase works"

# Continue from last session
codetether run --continue "add tests for the new feature"

# Use a specific model
codetether run --model openrouter/stepfun/step-3.5-flash:free "explain this code"

Note: codetether run is chat-only mode without tools. For coding tasks, use swarm or ralph.

HTTP Server

# Start the API server
codetether serve --port 4096

Configuration Management

# Show current config
codetether config --show

# Initialize default config
codetether config --init

Configuration

Configuration is stored in ~/.config/codetether-agent/config.toml:

[default]
provider = "anthropic"
model = "claude-sonnet-4-20250514"

[a2a]
enabled = true
auto_connect = true

[ui]
theme = "dark"

[session]
auto_save = true

Note: API keys are NOT stored in config files. They must be stored in HashiCorp Vault.

HashiCorp Vault Setup

Vault Secret Structure

secret/codetether/providers/
├── openai       → { "api_key": "sk-...", "organization": "org-..." }
├── anthropic    → { "api_key": "sk-ant-..." }
├── google       → { "api_key": "AIza..." }
├── deepseek     → { "api_key": "..." }
└── ...

Environment Variables

Variable	Description
`VAULT_ADDR`	Vault server address (e.g., `https://vault.example.com:8200`)
`VAULT_TOKEN`	Vault authentication token
`VAULT_MOUNT`	KV secrets engine mount path (default: `secret`)
`VAULT_SECRETS_PATH`	Path prefix for provider secrets (default: `codetether/providers`)
`CODETETHER_DEFAULT_MODEL`	Default model to use (e.g., `moonshotai/kimi-k2.5`)
`CODETETHER_SERVER`	A2A server URL
`CODETETHER_TOKEN`	Authentication token
`CODETETHER_WORKER_NAME`	Worker name
`CODETETHER_COGNITION_ENABLED`	Enable perpetual cognition runtime (`true`/`false`, default: `true`)
`CODETETHER_COGNITION_AUTO_START`	Auto-start cognition loop on `serve` startup (default: `true`)
`CODETETHER_COGNITION_LOOP_INTERVAL_MS`	Loop interval in milliseconds (default: `2000`)
`CODETETHER_COGNITION_MAX_SPAWN_DEPTH`	Max persona lineage depth (default: `4`)
`CODETETHER_COGNITION_MAX_BRANCHING_FACTOR`	Max active children per persona (default: `4`)
`CODETETHER_COGNITION_MAX_EVENTS`	In-memory event buffer size (default: `2000`)
`CODETETHER_COGNITION_MAX_SNAPSHOTS`	In-memory snapshot buffer size (default: `128`)
`CODETETHER_COGNITION_THINKER_ENABLED`	Enable model-backed thought generation (`true`/`false`, default: `true`)
`CODETETHER_COGNITION_THINKER_BACKEND`	Thinker backend: `openai_compat` or `candle` (default: `openai_compat`)
`CODETETHER_COGNITION_THINKER_BASE_URL`	OpenAI-compatible base URL for thinker model (default: `http://127.0.0.1:11434/v1`)
`CODETETHER_COGNITION_THINKER_MODEL`	Model id for thought generation (default: `qwen2.5:3b-instruct`)
`CODETETHER_COGNITION_THINKER_API_KEY`	Optional API key for thinker endpoint
`CODETETHER_COGNITION_THINKER_TEMPERATURE`	Thinker temperature (default: `0.2`)
`CODETETHER_COGNITION_THINKER_TOP_P`	Optional thinker top-p
`CODETETHER_COGNITION_THINKER_MAX_TOKENS`	Max generated tokens per thought step (default: `256`)
`CODETETHER_COGNITION_THINKER_TIMEOUT_MS`	Thinker request timeout in ms (default: `12000`)
`CODETETHER_COGNITION_THINKER_CANDLE_MODEL_PATH`	GGUF model path for in-process Candle inference
`CODETETHER_COGNITION_THINKER_CANDLE_TOKENIZER_PATH`	`tokenizer.json` path used by Candle backend
`CODETETHER_COGNITION_THINKER_CANDLE_ARCH`	Candle model architecture (`llama` or `qwen2`, default: auto from GGUF metadata)
`CODETETHER_COGNITION_THINKER_CANDLE_DEVICE`	Candle device selection: `auto`, `cpu`, or `cuda` (default: `auto`)
`CODETETHER_COGNITION_THINKER_CANDLE_CUDA_ORDINAL`	CUDA device ordinal when using `cuda` (default: `0`)
`CODETETHER_COGNITION_THINKER_CANDLE_REPEAT_PENALTY`	Candle repetition penalty (default: `1.1`)
`CODETETHER_COGNITION_THINKER_CANDLE_REPEAT_LAST_N`	Token window for repetition penalty (default: `64`)
`CODETETHER_COGNITION_THINKER_CANDLE_SEED`	Base sampling seed for Candle thinker (default: `42`)

GPU execution requires building with --features candle-cuda (or candle-cudnn).

Using Vault Agent

For production, use Vault Agent for automatic token renewal:

# vault-agent.hcl
vault {
  address = "https://vault.example.com:8200"
}

auto_auth {
  method "kubernetes" {
    mount_path = "auth/kubernetes"
    config = {
      role = "codetether-agent"
    }
  }

  sink "file" {
    config = {
      path = "/tmp/vault-token"
    }
  }
}

Agents

Build Agent

Full access to development tools. Can read, write, edit files and execute commands.

Plan Agent

Read-only access for analysis and exploration. Perfect for understanding codebases before making changes.

Explore Agent

Specialized for code navigation and discovery.

Tools

CodeTether Agent includes 27+ tools for comprehensive development automation:

File Operations

Tool	Description
`read_file`	Read file contents
`write_file`	Write content to files
`list_dir`	List directory contents
`glob`	Find files by pattern
`edit`	Apply search/replace patches
`multiedit`	Batch edits across multiple files
`apply_patch`	Apply unified diff patches

Code Intelligence

Tool	Description
`lsp`	Language Server Protocol operations (definition, references, hover, completion)
`grep`	Search file contents with regex
`codesearch`	Semantic code search

Execution

Tool	Description
`bash`	Execute shell commands
`batch`	Run multiple tool calls in parallel
`task`	Background task execution

Web & External

Tool	Description
`webfetch`	Fetch web pages with smart extraction
`websearch`	Search the web for information

Agent Orchestration

Tool	Description
`ralph`	Autonomous PRD-driven agent loop
`rlm`	Recursive Language Model for large contexts
`prd`	Generate and manage PRD documents
`plan_enter`/`plan_exit`	Switch to planning mode
`question`	Ask clarifying questions
`skill`	Execute learned skills
`todo_read`/`todo_write`	Track task progress

A2A Protocol

CodeTether Agent is built for the A2A (Agent-to-Agent) protocol:

Worker Mode (default): Connect to the CodeTether platform and process tasks
Server Mode: Accept tasks from other agents (codetether serve)
Client Mode: Dispatch tasks to other A2A agents

AgentCard

When running as a server, the agent exposes its capabilities via /.well-known/agent.json:

{
  "name": "CodeTether Agent",
  "description": "A2A-native AI coding agent",
  "version": "0.1.0",
  "skills": [
    { "id": "code-generation", "name": "Code Generation" },
    { "id": "code-review", "name": "Code Review" },
    { "id": "debugging", "name": "Debugging" }
  ]
}

Perpetual Persona Swarms API (Phase 0)

When running codetether serve, the agent also exposes cognition + swarm control APIs:

Method	Endpoint	Description
`POST`	`/v1/cognition/start`	Start perpetual cognition loop
`POST`	`/v1/cognition/stop`	Stop cognition loop
`GET`	`/v1/cognition/status`	Runtime status and buffer metrics
`GET`	`/v1/cognition/stream`	SSE stream of thought events
`GET`	`/v1/cognition/snapshots/latest`	Latest compressed memory snapshot
`POST`	`/v1/swarm/personas`	Create a root persona
`POST`	`/v1/swarm/personas/{id}/spawn`	Spawn child persona
`POST`	`/v1/swarm/personas/{id}/reap`	Reap a persona (optional cascade)
`GET`	`/v1/swarm/lineage`	Current persona lineage graph

/v1/cognition/start auto-seeds a default root-thinker persona when no personas exist, unless a seed_persona is provided.

See docs/perpetual_persona_swarms.md for request/response contracts.

CUDA Build/Deploy Helpers

From codetether-agent/:

make build-cuda - Build a CUDA-enabled binary locally.
make deploy-spike2-cuda - Sync source to spike2, build with --features candle-cuda, install, and restart service.
make status-spike2-cuda - Check service status, active Candle device config, and GPU usage on spike2.

Architecture

┌─────────────────────────────────────────────────────────┐
│                   CodeTether Platform                   │
│                  (A2A Server at api.codetether.run)     │
└────────────────────────┬────────────────────────────────┘
                         │ SSE/JSON-RPC
                         ▼
┌─────────────────────────────────────────────────────────┐
│                   codetether-agent                      │
│   ┌─────────┐  ┌─────────┐  ┌─────────┐  ┌─────────┐    │
│   │ A2A     │  │ Agent   │  │ Tool    │  │ Provider│    │
│   │ Worker  │  │ System  │  │ System  │  │ Layer   │    │
│   └────┬────┘  └────┬────┘  └────┬────┘  └────┬────┘    │
│        │            │            │            │         │
│        └────────────┴────────────┴────────────┘         │
│                          │                              │
│   ┌──────────────────────┴──────────────────────────┐   │
│   │              HashiCorp Vault                    │   │
│   │         (API Keys & Secrets)                    │   │
│   └─────────────────────────────────────────────────┘   │
└─────────────────────────────────────────────────────────┘

Swarm: Parallel Sub-Agent Execution

The swarm command decomposes complex tasks into parallelizable subtasks and executes them concurrently:

# Execute a complex task with parallel sub-agents (uses CODETETHER_DEFAULT_MODEL or defaults to moonshotai/kimi-k2.5)
codetether swarm "Implement user authentication with tests and documentation"

# Specify a model explicitly
codetether swarm "Implement feature X" --model moonshotai/kimi-k2.5

# Control parallelism and strategy
codetether swarm "Refactor the API layer" --strategy domain --max-subagents 8

# Generate JSON output
codetether swarm "Analyze codebase" --json

Decomposition Strategies

Strategy	Description
`auto`	LLM-driven automatic decomposition (default)
`domain`	Split by domain expertise (frontend, backend, etc.)
`data`	Split by data partitions
`stage`	Split by pipeline stages (analyze → implement → test)
`none`	Execute as single task

RLM: Recursive Language Model Processing

The rlm command handles large contexts that exceed model context windows using the Recursive Language Model approach:

# Analyze a large source file
codetether rlm "What are the main functions?" -f src/large_file.rs

# Analyze multiple files
codetether rlm "Find all error handling patterns" -f src/*.rs

# Analyze stdin content
cat logs/*.log | codetether rlm "Summarize the errors" --content -

# JSON output for programmatic use
codetether rlm "List all TODO comments" -f src/**/*.rs --json

How RLM Works

Based on the "Recursive Language Model" paper approach:

Context Loading: Large content is loaded into a REPL-like environment
LLM Analysis: The LLM writes code to explore the context (head, tail, grep, etc.)
Sub-LM Calls: The LLM can call llm_query() for semantic sub-questions
FINAL Answer: After 1-5 iterations, the LLM returns a synthesized answer

RLM Commands (Internal REPL)

Command	Description
`head(n)`	First n lines of context
`tail(n)`	Last n lines of context
`grep("pattern")`	Search for regex pattern
`count("pattern")`	Count pattern occurrences
`llm_query("question")`	Ask semantic sub-question
`FINAL("answer")`	Return final answer

Ralph: Autonomous PRD-Driven Agent Loop

Ralph is an autonomous agent loop that implements features from a structured PRD (Product Requirements Document). Each iteration is a fresh agent instance with clean context, while memory persists via git history, progress.txt, and the PRD itself.

# Create a new PRD template
codetether ralph create-prd --feature "User Authentication" --project-name "my-app"

# Run Ralph to implement the PRD (note: -p or --prd is required for custom PRD path)
codetether ralph run --prd prd.json --model "moonshotai/kimi-k2.5" --max-iterations 10

# Or using short flags
codetether ralph run -p my-feature-prd.json -m "moonshotai/kimi-k2.5"

# Check status
codetether ralph status --prd prd.json

How Ralph Works

Load PRD: Read user stories with acceptance criteria, priorities, and dependencies
Select Story: Pick the highest-priority incomplete story with satisfied dependencies
Implement: The AI agent has full tool access to read, write, edit, and execute
Quality Check: Run all quality checks (cargo check, clippy, test, build)
Mark Complete: Update PRD with pass/fail status
Repeat: Continue until all stories pass or max iterations reached

PRD Structure

{
  "project": "my-app",
  "feature": "User Authentication",
  "branch": "feature/user-auth",
  "quality_checks": {
    "typecheck": "cargo check",
    "lint": "cargo clippy",
    "test": "cargo test",
    "build": "cargo build --release"
  },
  "user_stories": [
    {
      "id": "US-001",
      "title": "Login endpoint",
      "description": "Implement POST /auth/login",
      "acceptance_criteria": ["Returns JWT on success", "Returns 401 on failure"],
      "priority": 1,
      "complexity": 2,
      "depends_on": [],
      "passes": false
    }
  ]
}

Memory Across Iterations

Ralph maintains memory across iterations without context window bloat:

Memory Source	Purpose
Git history	Commits from previous iterations show what changed
progress.txt	Agent writes learnings, blockers, and context
prd.json	Tracks which stories pass/fail
Quality checks	Error output guides next iteration

Dogfooding: Self-Implementing Agent

This project demonstrates true dogfooding—using the agent to build its own features.

What We Accomplished

Using ralph and swarm, the agent autonomously implemented:

LSP Client Implementation (10 stories):

US-001: LSP Transport Layer - stdio implementation
US-002: JSON-RPC Message Framework
US-003: LSP Initialize Handshake
US-004: Text Document Synchronization - didOpen
US-005: Text Document Synchronization - didChange
US-006: Text Document Completion
US-007: Text Document Hover
US-008: Text Document Definition
US-009: LSP Shutdown and Exit
US-010: LSP Client Configuration and Server Management

Missing Features (10 stories):

MF-001: External Directory Tool
MF-002: RLM Pool - Connection Pooling
MF-003: Truncation Utilities
MF-004: LSP Full Integration - Server Management
MF-005: LSP Transport - stdio Communication
MF-006: LSP Requests - textDocument/definition
MF-007: LSP Requests - textDocument/references
MF-008: LSP Requests - textDocument/hover
MF-009: LSP Requests - textDocument/completion
MF-010: RLM Router Enhancement

Results

Metric	Value
Total User Stories	20
Stories Passed	20 (100%)
Total Iterations	20
Quality Checks Per Story	4 (check, clippy, test, build)
Lines of Code Generated	~6,000+
Time to Complete	~30 minutes
Model Used	Kimi K2.5 (Moonshot AI)

Efficiency Comparison

Approach	Time	Cost	Notes
Manual Development	80 hours	$8,000	Senior dev @ $100/hr, 50-100 LOC/day
opencode + subagents	100 min	~$11.25	Bun runtime, Kimi K2.5 (same model)
codetether swarm	29.5 min	$3.75	Native Rust, Kimi K2.5

vs Manual: 163x faster, 2133x cheaper vs opencode: 3.4x faster, ~3x cheaper (same Kimi K2.5 model)

Key advantages over opencode subagents (model parity):

Native Rust binary (13ms startup vs 25-50ms Bun)
Direct API calls vs TypeScript HTTP overhead
PRD-driven state in files vs subagent process spawning
~3x fewer tokens due to reduced subagent initialization overhead

Note: Both have LLM-based compaction. The efficiency gain comes from PRD-driven architecture (state in prd.json + progress.txt) vs. spawning subprocesses with rebuilt context.

How to Replicate

# 1. Create a PRD for your feature
cat > prd.json << 'EOF'
{
  "project": "my-project",
  "feature": "My Feature",
  "quality_checks": {
    "typecheck": "cargo check",
    "test": "cargo test",
    "lint": "cargo clippy",
    "build": "cargo build --release"
  },
  "user_stories": [
    {
      "id": "US-001",
      "title": "First Story",
      "description": "Implement the first piece",
      "acceptance_criteria": ["Compiles", "Tests pass"],
      "priority": 1,
      "depends_on": [],
      "passes": false
    }
  ]
}
EOF

# 2. Run Ralph
codetether ralph run -p prd.json -m "kimi-k2.5" --max-iterations 10

# 3. Watch as your feature gets implemented autonomously

Why This Matters

Proof of Capability: The agent can implement non-trivial features end-to-end
Quality Assurance: Every story passes cargo check, clippy, test, and build
Autonomous Operation: No human intervention during implementation
Reproducible Process: PRD-driven development is structured and repeatable
Self-Improvement: The agent literally improved itself

Content Types

RLM auto-detects content type for optimized processing:

Type	Detection	Optimization
`code`	Function definitions, imports	Semantic chunking by symbols
`logs`	Timestamps, log levels	Time-based chunking
`conversation`	Chat markers, turns	Turn-based chunking
`documents`	Markdown headers, paragraphs	Section-based chunking

Example Output

$ codetether rlm "What are the 3 main functions?" -f src/chunker.rs --json
{
  "answer": "The 3 main functions are: 1) chunk_content() - splits content...",
  "iterations": 1,
  "sub_queries": 0,
  "stats": {
    "input_tokens": 322,
    "output_tokens": 235,
    "elapsed_ms": 10982
  }
}

Performance: Why Rust Over Bun/TypeScript

CodeTether Agent is written in Rust for measurable performance advantages over JavaScript/TypeScript runtimes like Bun:

Benchmark Results

Metric	CodeTether (Rust)	opencode (Bun)	Advantage
Binary Size	12.5 MB	~90 MB (bun + deps)	7.2x smaller
Startup Time	13 ms	25-50 ms	2-4x faster
Memory (idle)	~15 MB	~50-80 MB	3-5x less
Memory (swarm, 10 agents)	~45 MB	~200+ MB	4-5x less
Process Spawn	1.5 ms	5-10 ms	3-7x faster
Cold Start (container)	~50 ms	~200-500 ms	4-10x faster

Why This Matters for Sub-Agents

Lower Memory Per Agent: With 3-5x less memory per agent, you can run more concurrent sub-agents on the same hardware. A 4GB container can run ~80 Rust sub-agents vs ~15-20 Bun sub-agents.
Faster Spawn Time: Sub-agents spawn in 1.5ms vs 5-10ms. For a swarm of 100 agents, that's 150ms vs 500-1000ms just in spawn overhead.
No GC Pauses: Rust has no garbage collector. JavaScript/Bun has GC pauses that can add latency spikes of 10-50ms during high-memory operations.
True Parallelism: Rust's tokio runtime uses OS threads with work-stealing. Bun uses a single-threaded event loop that can bottleneck on CPU-bound decomposition.
Smaller Attack Surface: Smaller binary = fewer dependencies = smaller CVE surface. Critical for agents with shell access.

Resource Efficiency for Swarm Workloads

┌─────────────────────────────────────────────────────────────────┐
│                    Memory Usage Comparison                      │
│                                                                 │
│  Sub-Agents    CodeTether (Rust)       opencode (Bun)           │
│  ────────────────────────────────────────────────────────────── │
│       1            15 MB                   60 MB                │
│       5            35 MB                  150 MB                │
│      10            55 MB                  280 MB                │
│      25           105 MB                  650 MB                │
│      50           180 MB                 1200 MB                │
│     100           330 MB                 2400 MB                │
│                                                                 │
│  At 100 sub-agents: Rust uses 7.3x less memory                  │
└─────────────────────────────────────────────────────────────────┘

Real-World Impact

For a typical swarm task (e.g., "Implement feature X with tests"):

Scenario	CodeTether	opencode (Bun)
Task decomposition	50ms	150ms
Spawn 5 sub-agents	8ms	35ms
Peak memory	45 MB	180 MB
Total overhead	~60ms	~200ms

Result: 3.3x faster task initialization, 4x less memory, more capacity for actual AI inference.

Measured: Dogfooding Task (20 User Stories)

Actual resource usage from implementing 20 user stories autonomously:

┌─────────────────────────────────────────────────────────────────┐
│           Dogfooding Task: 20 Stories, Same Model (Kimi K2.5)   │
│                                                                 │
│  Metric              CodeTether           opencode (estimated)  │
│  ────────────────────────────────────────────────────────────── │
│  Total Time          29.5 min             100 min (3.4x slower) │
│  Wall Clock          1,770 sec            6,000 sec             │
│  Iterations          20                   20                    │
│  Spawn Overhead      20 × 1.5ms = 30ms    20 × 7.5ms = 150ms    │
│  Startup Overhead    20 × 13ms = 260ms    20 × 37ms = 740ms     │
│  Peak Memory         ~55 MB               ~280 MB               │
│  Tokens Used         500K                 ~1.5M (subagent init) │
│  Token Cost          $3.75                ~$11.25               │
│                                                                 │
│  Total Overhead      290ms                890ms (3.1x more)     │
│  Memory Efficiency   5.1x less peak RAM                         │
│  Cost Efficiency     ~3x cheaper                                │
└─────────────────────────────────────────────────────────────────┘

Computation Notes:

Spawn overhead: iterations × spawn_time (1.5ms Rust vs 7.5ms Bun avg)
Startup overhead: iterations × startup_time (13ms Rust vs 37ms Bun avg)
Token difference: opencode has compaction, but subagent spawns rebuild system prompt + context each time (~3x more tokens)
Memory: Based on 10-agent swarm profile (55 MB vs 280 MB)
Cost: Same Kimi K2.5 pricing, difference is from subagent initialization overhead

Note: opencode uses LLM-based compaction for long sessions (similar to codetether). The token difference comes from subagent process spawning overhead, not lack of context management.

Benchmark Methodology

Run benchmarks yourself:

./script/benchmark.sh

Benchmarks performed on:

Ubuntu 24.04, x86_64
48 CPU threads, 32GB RAM
Rust 1.85, Bun 1.x
HashiCorp Vault for secrets

Development

# Run in development mode
cargo run -- --server http://localhost:8080

# Run tests
cargo test

# Build release binary
cargo build --release

# Run benchmarks
./script/benchmark.sh

License

MIT

codetether-agent 1.0.0

CodeTether Agent

What's New in v0.1.5

Features

Installation

One-Click Install (Recommended)

From crates.io

From GitHub Releases

From Source

FunctionGemma Tool-Call Router

The Problem

The Solution

Why This Is Novel

How It Works

Setup

Configuration

Opting Out

Crash Reporting (Opt-In)

Quick Start

1. Configure HashiCorp Vault

If You See "No providers available"

Supported Providers

2. Connect to CodeTether Platform

3. Or Use Interactive Mode

CLI Quick Reference

Usage

Default Mode: A2A Worker

Interactive TUI

TUI Slash Commands

TUI Keyboard Shortcuts

Non-Interactive Mode (Chat - No Tools)

HTTP Server

Configuration Management

Configuration

HashiCorp Vault Setup

Vault Secret Structure

Environment Variables

Using Vault Agent

Agents

Build Agent

Plan Agent

Explore Agent

Tools

File Operations

Code Intelligence

Execution

Web & External

Agent Orchestration

A2A Protocol

AgentCard

Perpetual Persona Swarms API (Phase 0)

CUDA Build/Deploy Helpers

Architecture

Swarm: Parallel Sub-Agent Execution

Decomposition Strategies

RLM: Recursive Language Model Processing

How RLM Works

RLM Commands (Internal REPL)

Ralph: Autonomous PRD-Driven Agent Loop

How Ralph Works

PRD Structure

Memory Across Iterations

Dogfooding: Self-Implementing Agent

What We Accomplished

Results

Efficiency Comparison

How to Replicate

Why This Matters

Content Types

Example Output

Performance: Why Rust Over Bun/TypeScript

Benchmark Results

Why This Matters for Sub-Agents

Resource Efficiency for Swarm Workloads

Real-World Impact

Measured: Dogfooding Task (20 User Stories)

Benchmark Methodology

Development

License