CodeTether Agent

A high-performance AI coding agent written in Rust. A2A (Agent-to-Agent) protocol support with dual JSON-RPC + gRPC transports, in-process agent message bus, rich terminal UI, parallel swarm execution, autonomous PRD-driven development, and a local FunctionGemma tool-call router that separates reasoning from formatting.

Install

Via npx (no Rust required)

npx codetether tui
npx codetether run "explain this codebase"

Linux / macOS

curl -fsSL https://raw.githubusercontent.com/rileyseaburg/codetether-agent/main/install.sh | sh

Downloads the binary and the FunctionGemma model (~292 MB) for local tool-call routing. No Rust toolchain required.

# Skip FunctionGemma model download
curl -fsSL https://raw.githubusercontent.com/rileyseaburg/codetether-agent/main/install.sh | sh -s -- --no-functiongemma

Windows (PowerShell)

irm https://raw.githubusercontent.com/rileyseaburg/codetether-agent/main/install.ps1 | iex

# Skip FunctionGemma model
.\install.ps1 -NoFunctionGemma

From crates.io

cargo install codetether-agent

# Hardware acceleration for FunctionGemma:
cargo install codetether-agent --features candle-accelerate  # Apple Silicon / Intel Mac
cargo install codetether-agent --features candle-mkl         # Intel/AMD Linux (MKL)
cargo install codetether-agent --features candle-cuda        # NVIDIA GPU

From Source

git clone https://github.com/rileyseaburg/codetether-agent
cd codetether-agent
cargo build --release
# Binary at target/release/codetether

# Without FunctionGemma (smaller binary)
cargo build --release --no-default-features

Quick Start

1. Configure Vault

All API keys live in HashiCorp Vault — never in config files or env vars.

export VAULT_ADDR="https://vault.example.com:8200"
export VAULT_TOKEN="hvs.your-token"

# Add a provider
vault kv put secret/codetether/providers/openrouter api_key="sk-or-v1-..."

2. Launch the TUI

codetether tui

3. Or Run a Single Prompt

codetether run "explain this codebase"

CLI

codetether tui                           # Interactive terminal UI
codetether run "prompt"                  # Single prompt
codetether run -- "/go <task>"           # Strategic relay (OKR-gated execution)
codetether run "/autochat <task>"        # Tactical relay (fast path)
codetether swarm "complex task"          # Parallel sub-agent execution
codetether swarm "complex task" --execution-mode k8s --k8s-pod-budget 4 --k8s-image <image>
codetether ralph run --prd prd.json      # Autonomous PRD-driven development
codetether ralph create-prd --feature X  # Generate a PRD template
codetether serve --port 4096             # HTTP server (A2A + cognition APIs)
codetether worker --server URL           # A2A worker mode
codetether spawn --name planner --peer http://localhost:4096/a2a  # Spawn A2A agent
codetether forage --loop --execute       # Autonomous OKR-governed work loop
codetether auth codex                    # OAuth login for OpenAI Codex
codetether auth copilot --client-id ID   # OAuth login for GitHub Copilot
codetether index --path src --json       # Build codebase index (local embeddings)
codetether okr list                      # List OKRs
codetether okr report --id <uuid>        # Show OKR or run report
codetether pr                            # Create/update pull requests
codetether models                        # List available models from all providers
codetether stats                         # Telemetry & execution statistics
codetether benchmark                     # Run model benchmark suite
codetether cleanup                       # Clean orphaned worktrees
codetether config --show                 # Show config

codetether index always generates embeddings locally (no paid API required). Tune with --embedding-model, --embedding-dimensions, --embedding-batch-size, and --embedding-input-chars.

Forage: Autonomous OKR-Governed Loop

# Scan for top opportunities
codetether forage --top 5
codetether forage --top 5 --no-s3            # Local only, skip S3

# Autonomous loop
codetether forage --loop --interval-secs 120 --top 3

# Autonomous + execute
codetether forage --loop --execute --interval-secs 120 --top 3

# Smart swarms in forage loop
codetether forage --loop --execute --execution-engine swarm --interval-secs 120 --top 3 \
  --swarm-max-subagents 8 --swarm-strategy auto --model openai-codex/gpt-5.1-codex

# Moonshot rubric: mission statements that bias prioritization
codetether forage --loop --execute --execution-engine swarm --interval-secs 120 --top 3 \
  --moonshot "Autonomous agents continuously ship measurable customer value" \
  --moonshot "Reliability first: no data loss in long-running autonomy"

# Strict moonshot gate
codetether forage --loop --execute --execution-engine swarm --interval-secs 120 --top 3 \
  --moonshot-file ./.codetether-agent/moonshots.txt \
  --moonshot-required --moonshot-min-alignment 0.25

Notes:

--execute mode auto-seeds a default mission OKR if the repository is empty so the loop can self-start.
Without --execute, forage only reports existing opportunities.
KR progress is only recorded when quality gates (cargo check, cargo test) pass.

Security

CodeTether treats security as non-optional infrastructure, not a feature flag.

Control	Implementation
Authentication	Mandatory Bearer token on every endpoint (except `/health`). Cannot be disabled.
Audit Trail	Append-only JSON Lines log of every action — queryable by actor, action, resource, time range.
Plugin Signing	Ed25519 signatures on tool manifests. SHA-256 content hashing. Unsigned tools rejected.
Sandboxing	Resource-limited execution: max memory, max CPU seconds, network allow/deny per tool.
Secrets	All API keys stored in HashiCorp Vault — never in config files or environment variables.
K8s Self-Healing	Reconciliation loop detects unhealthy pods and triggers rolling restarts.

Features

FunctionGemma Tool Router

Your primary LLM (Claude, GPT-4o, Kimi, etc.) focuses on reasoning. A local model (FunctionGemma, 270M params) handles structured tool-call formatting via Candle inference (~5-50ms on CPU).

Provider-agnostic — Switch models freely; tool-call behavior stays consistent.
Zero overhead — If the LLM already returns tool calls, FunctionGemma is never invoked.
Safe degradation — On any error, the original response is returned unchanged.

export CODETETHER_TOOL_ROUTER_ENABLED=true
export CODETETHER_TOOL_ROUTER_MODEL_PATH="$HOME/.local/share/codetether/models/functiongemma/functiongemma-270m-it-Q8_0.gguf"
export CODETETHER_TOOL_ROUTER_TOKENIZER_PATH="$HOME/.local/share/codetether/models/functiongemma/tokenizer.json"

Variable	Default	Description
`CODETETHER_TOOL_ROUTER_ENABLED`	`false`	Activate the router
`CODETETHER_TOOL_ROUTER_MODEL_PATH`	—	Path to `.gguf` model
`CODETETHER_TOOL_ROUTER_TOKENIZER_PATH`	—	Path to `tokenizer.json`
`CODETETHER_TOOL_ROUTER_ARCH`	`gemma3`	Architecture hint
`CODETETHER_TOOL_ROUTER_DEVICE`	`auto`	`auto` / `cpu` / `cuda`
`CODETETHER_TOOL_ROUTER_MAX_TOKENS`	`512`	Max decode tokens
`CODETETHER_TOOL_ROUTER_TEMPERATURE`	`0.1`	Sampling temperature

RLM: Recursive Language Model

Handles content that exceeds model context windows. Loads context into a REPL, lets the LLM explore it with structured tool calls (rlm_head, rlm_tail, rlm_grep, rlm_count, rlm_slice, rlm_llm_query), and returns a synthesized answer via rlm_final.

codetether rlm "What are the main functions?" -f src/large_file.rs
cat logs/*.log | codetether rlm "Summarize the errors" --content -

Local CUDA

cargo install --path . --force --features candle-cuda,functiongemma

export LOCAL_CUDA_MODEL="qwen3.5-9b"
export LOCAL_CUDA_MODEL_PATH="$HOME/models/qwen3-4b/Qwen3-4B-Q4_K_M.gguf"
export LOCAL_CUDA_TOKENIZER_PATH="$HOME/models/qwen3-4b/tokenizer.json"

codetether rlm --model local_cuda/qwen3.5-9b --file src/rlm/repl.rs --json \
  "Find all occurrences of 'async fn' in src/rlm/repl.rs"

Content Types

Type	Detection	Optimization
`code`	Function definitions, imports	Semantic chunking by symbols
`logs`	Timestamps, log levels	Time-based chunking
`conversation`	Chat markers, turns	Turn-based chunking
`documents`	Markdown headers, paragraphs	Section-based chunking

OKR-Driven Execution

CodeTether uses OKRs (Objectives and Key Results) as the bridge between business strategy and autonomous agent execution. Instead of handing agents a task and hoping for the best, you state your intent, approve a plan, and get measurable outcomes.

The `/go` Lifecycle

┌──────────────────────────────────────────────────────────────────┐
│                        /go Lifecycle                             │
│                                                                  │
│  1. You state intent                                             │
│     └─ "/go audit the bin cleaning system for Q3 readiness"      │
│                                                                  │
│  2. System reframes as OKR                                       │
│     └─ Objective + Key Results generated from your prompt        │
│                                                                  │
│  3. You approve or deny                                          │
│     └─ TUI: press A (approve) or D (deny)                        │
│     └─ CLI: y/n prompt                                           │
│                                                                  │
│  4. Autonomous relay execution                                   │
│     └─ Swarms, tools, sequential agent turns                     │
│                                                                  │
│  5. KR progress updates (per relay turn)                         │
│     └─ Key Results evaluated and persisted after each turn       │
│                                                                  │
│  6. Completion + outcome                                         │
│     └─ Final KR outcomes recorded                                │
└──────────────────────────────────────────────────────────────────┘

`/go` vs `/autochat`

Command	Purpose	OKR Gate	Best For
`/go`	Strategic execution	Yes — draft → approve → run	Epics, business goals, tracked outcomes
`/autochat`	Tactical execution	No — runs immediately	Quick tasks, bug fixes

OKRs naturally support long-running work with persistent state, cumulative KR progress, checkpointed relays for crash recovery, and correlation IDs (okr_id, okr_run_id, relay_id, session_id) across all audit/event entries.

codetether okr list                     # List all OKRs
codetether okr status --id <uuid>       # Detailed status
codetether okr runs --id <uuid>         # List runs
codetether okr report --id <uuid>       # Full report
codetether okr export --id <uuid>       # Export as JSON
codetether okr stats                    # Aggregate stats

Swarm: Parallel Sub-Agent Execution

Decomposes complex tasks into subtasks and executes them concurrently.

codetether swarm "Implement user auth with tests and docs"
codetether swarm "Refactor the API layer" --strategy domain --max-subagents 8
codetether swarm "Ship feature X" --execution-mode k8s --k8s-pod-budget 6 --k8s-image <image>

Strategies: auto (default), domain, data, stage, none.

Execution modes:

local (default): sub-agents run as local async tasks.
k8s: sub-agents run as isolated Kubernetes pods with deterministic collapse-based pruning/promotion.

Ralph: Autonomous PRD-Driven Development

Give it a spec, watch it work story by story. Each iteration is a fresh agent with full tool access. Memory persists via git history, progress.txt, and the PRD file.

codetether ralph create-prd --feature "User Auth" --project-name my-app
codetether ralph run --prd prd.json --max-iterations 10

Terminal outcomes: Completed (all stories passed), MaxIterations (partial), QualityFailed (no stories passed gates).

TUI

Rich terminal UI with model selector, session picker, swarm view, Ralph view, and theme hot-reload.

Slash Commands: /go, /autochat, /new, /model, /sessions, /swarm, /ralph, /rlm, /bus, /lsp, /latency, /symbols, /settings, /file, /image, /spawn, /kill, /agents, /undo, /autoapply, /network, /mcp connect|servers|tools|call, /import-codex, /keys, /help

Keyboard: Ctrl+M model selector, Ctrl+B toggle layout, Ctrl+S/F2 swarm view, Tab switch agents, Alt+j/k scroll, ? help

Providers

Provider	Default Model	Notes
`zai`	`glm-5`	Z.AI flagship — GLM-5 agentic coding (200K ctx)
`moonshotai`	`kimi-k2.5`	Excellent for coding
`github-copilot`	`claude-opus-4`	GitHub Copilot models
`openai`	`gpt-4o`	OpenAI GPT models
`openai-codex`	`gpt-5-codex`	ChatGPT subscription OAuth
`openrouter`	`stepfun/step-3.5-flash:free`	Access to many models
`google`	`gemini-2.5-pro`	Google AI
`anthropic`	`claude-sonnet-4-20250514`	Direct API
`stepfun`	`step-3.5-flash`	Chinese reasoning model
`vertex-glm`	`zai-org/glm-5-maas`	GLM-5 via Vertex AI (service account JWT)
`vertex-anthropic`	`claude-sonnet-4-20250514`	Claude via GCP Vertex AI
`bedrock`	`amazon.nova-lite-v1:0` / `us.anthropic.claude-opus-4-6-v1:0`	Amazon Bedrock Converse API
`local-cuda`	(configurable)	Local CUDA inference via Candle (Qwen, etc.)
`gemini-web`	`gemini-2.5-pro`	Google Gemini web-based (cookie auth)

All keys stored in Vault at secret/codetether/providers/<name>.

Tools

47 built-in tools across file ops (read, write, edit, multiedit, patch, glob, list, tree, file_info, head_tail, diff), code intelligence (lsp, grep, codesearch, advanced_edit), execution (bash, batch, task), web (webfetch, websearch), media (image, voice, podcast, youtube, avatar), planning (ralph, prd, okr, todo_read, todo_write, plan_enter, plan_exit), agent orchestration (agent, swarm_execute, swarm_share, relay_autochat, go, rlm), knowledge (memory, skill, mcp_bridge), and utilities (undo, question, k8s_tool, confirm_edit, confirm_multiedit).

MCP Server

CodeTether exposes 30+ tools via the Model Context Protocol over stdio. This lets AI clients (GitHub Copilot in VS Code, Claude Desktop, etc.) call CodeTether tools directly.

VS Code (Workspace-Level)

Add .vscode/mcp.json to your workspace:

{
  "servers": {
    "codetether": {
      "command": "/home/riley/.cargo/bin/codetether",
      "args": ["mcp", "serve"],
      "env": {
        "RUST_LOG": "error"
      }
    }
  }
}

Claude Desktop

Edit ~/.config/Claude/claude_desktop_config.json:

{
  "mcpServers": {
    "codetether": {
      "command": "/path/to/codetether",
      "args": ["mcp", "serve"],
      "env": { "RUST_LOG": "error" }
    }
  }
}

For remote machines over SSH:

{
  "mcpServers": {
    "codetether": {
      "command": "ssh",
      "args": ["-T", "user@host", "cd /project && RUST_LOG=error /path/to/codetether mcp serve"]
    }
  }
}

Codex CLI

Add to ~/.codex/config.toml:

[mcp_servers.codetether]
command = "/absolute/path/to/codetether"
args = ["mcp", "serve", "/absolute/workspace/path"]

Exposed Tools (30+)

Category	Tools
File Ops	`read`, `write`, `edit`, `multiedit`, `patch`, `glob`, `list`, `tree`, `file_info`, `head_tail`, `diff`
Search	`grep`, `codesearch`, `advanced_edit`
Execution	`bash`, `batch`, `task`
Code Intelligence	`lsp` (includes diagnostics from eslint, ruff, biome, stylelint)
Web	`webfetch`, `websearch`
Agent Orchestration	`agent`, `swarm_execute`, `swarm_share`, `relay_autochat`, `go`, `rlm`
Planning	`ralph`, `prd`, `okr`, `todoread`, `todowrite`
Knowledge	`memory`, `skill`, `mcp` (bridge to other MCP servers)
Infrastructure	`k8s_tool`

codetether mcp list-tools          # List available MCP tools
codetether mcp list-tools --json   # JSON output
codetether mcp serve               # Start stdio MCP server
codetether mcp serve --bus-url URL # With agent bus integration

A2A Protocol

Dual-transport Agent-to-Agent communication with a shared in-process bus:

Worker mode — Connect to the CodeTether platform and process tasks.
Server mode — Accept tasks via JSON-RPC (Axum, :4096) and gRPC (Tonic, :50051) simultaneously.
Spawn mode — Launch a standalone A2A peer that auto-registers and discovers other peers.
Bus mode — In-process pub/sub for zero-latency local agent communication.

Transports

Transport	Port	Use Case
JSON-RPC (Axum)	`4096`	REST API, SSE streams, `/.well-known/agent.json`
gRPC (Tonic)	`50051`	High-frequency A2A RPCs, streaming
In-Process Bus	—	Local sub-agents, swarm coordination

gRPC RPCs

RPC	Description
`SendMessage`	Submit a task/message
`SendStreamingMessage`	Submit with streaming status updates
`GetTask`	Retrieve task by ID
`CancelTask`	Cancel a running task
`TaskSubscription`	Subscribe to status updates (server-stream)
`CreateTaskPushNotificationConfig`	Register push notification endpoint
`GetTaskPushNotificationConfig`	Get push notification config
`ListTaskPushNotificationConfig`	List push configs for a task
`DeleteTaskPushNotificationConfig`	Remove a push notification config
`GetAgentCard`	Retrieve the agent's capability card

Agent Bus Topics

Topic Pattern	Semantics
`agent.{id}`	Messages to a specific agent
`task.{id}`	All updates for a task
`swarm.{id}`	Swarm-level coordination
`broadcast`	Global announcements
`results.{key}`	Shared result publication
`tools.{name}`	Tool-specific channels

Cognition APIs

When running codetether serve, perpetual persona swarms with SSE event stream:

Method	Endpoint	Description
`POST`	`/v1/cognition/start`	Start perpetual cognition loop
`POST`	`/v1/cognition/stop`	Stop cognition loop
`GET`	`/v1/cognition/status`	Runtime status and metrics
`GET`	`/v1/cognition/stream`	SSE stream of thought events
`POST`	`/v1/swarm/personas`	Create a root persona
`POST`	`/v1/swarm/personas/{id}/spawn`	Spawn child persona
`POST`	`/v1/swarm/personas/{id}/reap`	Reap a persona
`GET`	`/v1/swarm/lineage`	Persona lineage graph

Architecture

┌─────────────────────────────────────────────────────────────┐
│                    CodeTether Platform                      │
│                 (A2A Server at api.codetether.run)          │
└───────────────┬───────────────────────┬─────────────────────┘
                │ SSE/JSON-RPC          │ gRPC (A2A proto)
                ▼                       ▼
┌─────────────────────────────────────────────────────────────┐
│                    codetether-agent                         │
│                                                             │
│   ┌───────────────────────────────────────────────────┐     │
│   │              Agent Message Bus                    │     │
│   │   (broadcast pub/sub, topic routing, BusHandle)   │     │
│   └──┬──────────┬──────────┬──────────┬───────────────┘     │
│      │          │          │          │                      │
│   ┌──┴───┐  ┌──┴───┐  ┌──┴───┐  ┌──┴────────┐              │
│   │ A2A  │  │ Swarm│  │ Tool │  │  Provider │              │
│   │Worker│  │ Exec │  │System│  │   Layer   │              │
│   └──┬───┘  └──┬───┘  └──┬───┘  └──┬────────┘              │
│      │         │         │         │                        │
│   ┌──┴─────────┴─────────┴─────────┴──┐                     │
│   │         Agent Registry            │                     │
│   └───────────────────────────────────┘                     │
│                                                             │
│   ┌──────────┐  ┌──────────┐  ┌──────────┐ ┌──────────┐    │
│   │JSON-RPC  │  │ gRPC     │  │ Auth     │ │ Audit    │    │
│   │(Axum)    │  │ (Tonic)  │  │ (Bearer) │ │ (JSONL)  │    │
│   │:4096     │  │ :50051   │  │ Mandatory│ │ Append   │    │
│   └──────────┘  └──────────┘  └──────────┘ └──────────┘    │
│   ┌──────────┐  ┌──────────┐  ┌──────────────────────┐     │
│   │ Sandbox  │  │ K8s Mgr  │  │  HashiCorp Vault     │     │
│   │ (Ed25519)│  │ (Deploy) │  │  (API Keys)          │     │
│   └──────────┘  └──────────┘  └──────────────────────┘     │
└─────────────────────────────────────────────────────────────┘

Configuration

~/.config/codetether-agent/config.toml:

[default]
provider = "anthropic"
model = "claude-sonnet-4-20250514"

[ui]
theme = "marketing"   # marketing, dark, light, solarized-dark, solarized-light

[session]
auto_save = true

[lsp]
[lsp.servers]
# my-ruby-lsp = { command = "ruby-lsp", args = ["--stdio"], file_extensions = ["rb"] }

[lsp.linters]
eslint = { enabled = true }
ruff = { enabled = true }
biome = { enabled = false }
stylelint = { enabled = true }

Environment Variables

Variable	Default	Description
`VAULT_ADDR`	—	Vault server address
`VAULT_TOKEN`	—	Authentication token
`VAULT_MOUNT`	`secret`	KV mount path
`VAULT_SECRETS_PATH`	`codetether/providers`	Provider secrets prefix
`CODETETHER_AUTH_TOKEN`	(auto-generated)	Bearer token for API auth
`CODETETHER_DATA_DIR`	`.codetether-agent`	Runtime data directory
`CODETETHER_GRPC_PORT`	`50051`	gRPC server port
`CODETETHER_A2A_PEERS`	—	Comma-separated peer seed URLs

Performance

Metric	Value
Startup	13ms
Memory (idle)	~15 MB
Memory (10-agent swarm)	~55 MB
Binary size	~12.5 MB

Written in Rust with tokio — true parallelism, no GC pauses, native performance. See CHANGELOG.md for benchmark details.

Development

cargo build                  # Debug build
cargo build --release        # Release build
cargo test                   # Run tests
cargo clippy --all-features  # Lint
cargo fmt                    # Format

License

MIT

codetether-agent 4.5.2