codetether-agent 4.5.2

A2A-native AI coding agent for the CodeTether ecosystem
Documentation

CodeTether Agent

Crates.io npm License: MIT GitHub Release

A high-performance AI coding agent written in Rust. A2A (Agent-to-Agent) protocol support with dual JSON-RPC + gRPC transports, in-process agent message bus, rich terminal UI, parallel swarm execution, autonomous PRD-driven development, and a local FunctionGemma tool-call router that separates reasoning from formatting.

Install

Via npx (no Rust required)

npx codetether tui
npx codetether run "explain this codebase"

Linux / macOS

curl -fsSL https://raw.githubusercontent.com/rileyseaburg/codetether-agent/main/install.sh | sh

Downloads the binary and the FunctionGemma model (~292 MB) for local tool-call routing. No Rust toolchain required.

# Skip FunctionGemma model download
curl -fsSL https://raw.githubusercontent.com/rileyseaburg/codetether-agent/main/install.sh | sh -s -- --no-functiongemma

Windows (PowerShell)

irm https://raw.githubusercontent.com/rileyseaburg/codetether-agent/main/install.ps1 | iex

# Skip FunctionGemma model
.\install.ps1 -NoFunctionGemma

From crates.io

cargo install codetether-agent

# Hardware acceleration for FunctionGemma:
cargo install codetether-agent --features candle-accelerate  # Apple Silicon / Intel Mac
cargo install codetether-agent --features candle-mkl         # Intel/AMD Linux (MKL)
cargo install codetether-agent --features candle-cuda        # NVIDIA GPU

From Source

git clone https://github.com/rileyseaburg/codetether-agent
cd codetether-agent
cargo build --release
# Binary at target/release/codetether

# Without FunctionGemma (smaller binary)
cargo build --release --no-default-features

Quick Start

1. Configure Vault

All API keys live in HashiCorp Vault — never in config files or env vars.

export VAULT_ADDR="https://vault.example.com:8200"
export VAULT_TOKEN="hvs.your-token"

# Add a provider
vault kv put secret/codetether/providers/openrouter api_key="sk-or-v1-..."

2. Launch the TUI

codetether tui

3. Or Run a Single Prompt

codetether run "explain this codebase"

CLI

codetether tui                           # Interactive terminal UI
codetether run "prompt"                  # Single prompt
codetether run -- "/go <task>"           # Strategic relay (OKR-gated execution)
codetether run "/autochat <task>"        # Tactical relay (fast path)
codetether swarm "complex task"          # Parallel sub-agent execution
codetether swarm "complex task" --execution-mode k8s --k8s-pod-budget 4 --k8s-image <image>
codetether ralph run --prd prd.json      # Autonomous PRD-driven development
codetether ralph create-prd --feature X  # Generate a PRD template
codetether serve --port 4096             # HTTP server (A2A + cognition APIs)
codetether worker --server URL           # A2A worker mode
codetether spawn --name planner --peer http://localhost:4096/a2a  # Spawn A2A agent
codetether forage --loop --execute       # Autonomous OKR-governed work loop
codetether auth codex                    # OAuth login for OpenAI Codex
codetether auth copilot --client-id ID   # OAuth login for GitHub Copilot
codetether index --path src --json       # Build codebase index (local embeddings)
codetether okr list                      # List OKRs
codetether okr report --id <uuid>        # Show OKR or run report
codetether pr                            # Create/update pull requests
codetether models                        # List available models from all providers
codetether stats                         # Telemetry & execution statistics
codetether benchmark                     # Run model benchmark suite
codetether cleanup                       # Clean orphaned worktrees
codetether config --show                 # Show config

codetether index always generates embeddings locally (no paid API required). Tune with --embedding-model, --embedding-dimensions, --embedding-batch-size, and --embedding-input-chars.

Forage: Autonomous OKR-Governed Loop

# Scan for top opportunities
codetether forage --top 5
codetether forage --top 5 --no-s3            # Local only, skip S3

# Autonomous loop
codetether forage --loop --interval-secs 120 --top 3

# Autonomous + execute
codetether forage --loop --execute --interval-secs 120 --top 3

# Smart swarms in forage loop
codetether forage --loop --execute --execution-engine swarm --interval-secs 120 --top 3 \
  --swarm-max-subagents 8 --swarm-strategy auto --model openai-codex/gpt-5.1-codex

# Moonshot rubric: mission statements that bias prioritization
codetether forage --loop --execute --execution-engine swarm --interval-secs 120 --top 3 \
  --moonshot "Autonomous agents continuously ship measurable customer value" \
  --moonshot "Reliability first: no data loss in long-running autonomy"

# Strict moonshot gate
codetether forage --loop --execute --execution-engine swarm --interval-secs 120 --top 3 \
  --moonshot-file ./.codetether-agent/moonshots.txt \
  --moonshot-required --moonshot-min-alignment 0.25

Notes:

  • --execute mode auto-seeds a default mission OKR if the repository is empty so the loop can self-start.
  • Without --execute, forage only reports existing opportunities.
  • KR progress is only recorded when quality gates (cargo check, cargo test) pass.

Security

CodeTether treats security as non-optional infrastructure, not a feature flag.

Control Implementation
Authentication Mandatory Bearer token on every endpoint (except /health). Cannot be disabled.
Audit Trail Append-only JSON Lines log of every action — queryable by actor, action, resource, time range.
Plugin Signing Ed25519 signatures on tool manifests. SHA-256 content hashing. Unsigned tools rejected.
Sandboxing Resource-limited execution: max memory, max CPU seconds, network allow/deny per tool.
Secrets All API keys stored in HashiCorp Vault — never in config files or environment variables.
K8s Self-Healing Reconciliation loop detects unhealthy pods and triggers rolling restarts.

Features

FunctionGemma Tool Router

Your primary LLM (Claude, GPT-4o, Kimi, etc.) focuses on reasoning. A local model (FunctionGemma, 270M params) handles structured tool-call formatting via Candle inference (~5-50ms on CPU).

  • Provider-agnostic — Switch models freely; tool-call behavior stays consistent.
  • Zero overhead — If the LLM already returns tool calls, FunctionGemma is never invoked.
  • Safe degradation — On any error, the original response is returned unchanged.
export CODETETHER_TOOL_ROUTER_ENABLED=true
export CODETETHER_TOOL_ROUTER_MODEL_PATH="$HOME/.local/share/codetether/models/functiongemma/functiongemma-270m-it-Q8_0.gguf"
export CODETETHER_TOOL_ROUTER_TOKENIZER_PATH="$HOME/.local/share/codetether/models/functiongemma/tokenizer.json"
Variable Default Description
CODETETHER_TOOL_ROUTER_ENABLED false Activate the router
CODETETHER_TOOL_ROUTER_MODEL_PATH Path to .gguf model
CODETETHER_TOOL_ROUTER_TOKENIZER_PATH Path to tokenizer.json
CODETETHER_TOOL_ROUTER_ARCH gemma3 Architecture hint
CODETETHER_TOOL_ROUTER_DEVICE auto auto / cpu / cuda
CODETETHER_TOOL_ROUTER_MAX_TOKENS 512 Max decode tokens
CODETETHER_TOOL_ROUTER_TEMPERATURE 0.1 Sampling temperature

RLM: Recursive Language Model

Handles content that exceeds model context windows. Loads context into a REPL, lets the LLM explore it with structured tool calls (rlm_head, rlm_tail, rlm_grep, rlm_count, rlm_slice, rlm_llm_query), and returns a synthesized answer via rlm_final.

codetether rlm "What are the main functions?" -f src/large_file.rs
cat logs/*.log | codetether rlm "Summarize the errors" --content -

Local CUDA

cargo install --path . --force --features candle-cuda,functiongemma

export LOCAL_CUDA_MODEL="qwen3.5-9b"
export LOCAL_CUDA_MODEL_PATH="$HOME/models/qwen3-4b/Qwen3-4B-Q4_K_M.gguf"
export LOCAL_CUDA_TOKENIZER_PATH="$HOME/models/qwen3-4b/tokenizer.json"

codetether rlm --model local_cuda/qwen3.5-9b --file src/rlm/repl.rs --json \
  "Find all occurrences of 'async fn' in src/rlm/repl.rs"

Content Types

Type Detection Optimization
code Function definitions, imports Semantic chunking by symbols
logs Timestamps, log levels Time-based chunking
conversation Chat markers, turns Turn-based chunking
documents Markdown headers, paragraphs Section-based chunking

OKR-Driven Execution

CodeTether uses OKRs (Objectives and Key Results) as the bridge between business strategy and autonomous agent execution. Instead of handing agents a task and hoping for the best, you state your intent, approve a plan, and get measurable outcomes.

The /go Lifecycle

┌──────────────────────────────────────────────────────────────────┐
│                        /go Lifecycle                             │
│                                                                  │
│  1. You state intent                                             │
│     └─ "/go audit the bin cleaning system for Q3 readiness"      │
│                                                                  │
│  2. System reframes as OKR                                       │
│     └─ Objective + Key Results generated from your prompt        │
│                                                                  │
│  3. You approve or deny                                          │
│     └─ TUI: press A (approve) or D (deny)                        │
│     └─ CLI: y/n prompt                                           │
│                                                                  │
│  4. Autonomous relay execution                                   │
│     └─ Swarms, tools, sequential agent turns                     │
│                                                                  │
│  5. KR progress updates (per relay turn)                         │
│     └─ Key Results evaluated and persisted after each turn       │
│                                                                  │
│  6. Completion + outcome                                         │
│     └─ Final KR outcomes recorded                                │
└──────────────────────────────────────────────────────────────────┘

/go vs /autochat

Command Purpose OKR Gate Best For
/go Strategic execution Yes — draft → approve → run Epics, business goals, tracked outcomes
/autochat Tactical execution No — runs immediately Quick tasks, bug fixes

OKRs naturally support long-running work with persistent state, cumulative KR progress, checkpointed relays for crash recovery, and correlation IDs (okr_id, okr_run_id, relay_id, session_id) across all audit/event entries.

codetether okr list                     # List all OKRs
codetether okr status --id <uuid>       # Detailed status
codetether okr runs --id <uuid>         # List runs
codetether okr report --id <uuid>       # Full report
codetether okr export --id <uuid>       # Export as JSON
codetether okr stats                    # Aggregate stats

Swarm: Parallel Sub-Agent Execution

Decomposes complex tasks into subtasks and executes them concurrently.

codetether swarm "Implement user auth with tests and docs"
codetether swarm "Refactor the API layer" --strategy domain --max-subagents 8
codetether swarm "Ship feature X" --execution-mode k8s --k8s-pod-budget 6 --k8s-image <image>

Strategies: auto (default), domain, data, stage, none.

Execution modes:

  • local (default): sub-agents run as local async tasks.
  • k8s: sub-agents run as isolated Kubernetes pods with deterministic collapse-based pruning/promotion.

Ralph: Autonomous PRD-Driven Development

Give it a spec, watch it work story by story. Each iteration is a fresh agent with full tool access. Memory persists via git history, progress.txt, and the PRD file.

codetether ralph create-prd --feature "User Auth" --project-name my-app
codetether ralph run --prd prd.json --max-iterations 10

Terminal outcomes: Completed (all stories passed), MaxIterations (partial), QualityFailed (no stories passed gates).

TUI

Rich terminal UI with model selector, session picker, swarm view, Ralph view, and theme hot-reload.

Slash Commands: /go, /autochat, /new, /model, /sessions, /swarm, /ralph, /rlm, /bus, /lsp, /latency, /symbols, /settings, /file, /image, /spawn, /kill, /agents, /undo, /autoapply, /network, /mcp connect|servers|tools|call, /import-codex, /keys, /help

Keyboard: Ctrl+M model selector, Ctrl+B toggle layout, Ctrl+S/F2 swarm view, Tab switch agents, Alt+j/k scroll, ? help

Providers

Provider Default Model Notes
zai glm-5 Z.AI flagship — GLM-5 agentic coding (200K ctx)
moonshotai kimi-k2.5 Excellent for coding
github-copilot claude-opus-4 GitHub Copilot models
openai gpt-4o OpenAI GPT models
openai-codex gpt-5-codex ChatGPT subscription OAuth
openrouter stepfun/step-3.5-flash:free Access to many models
google gemini-2.5-pro Google AI
anthropic claude-sonnet-4-20250514 Direct API
stepfun step-3.5-flash Chinese reasoning model
vertex-glm zai-org/glm-5-maas GLM-5 via Vertex AI (service account JWT)
vertex-anthropic claude-sonnet-4-20250514 Claude via GCP Vertex AI
bedrock amazon.nova-lite-v1:0 / us.anthropic.claude-opus-4-6-v1:0 Amazon Bedrock Converse API
local-cuda (configurable) Local CUDA inference via Candle (Qwen, etc.)
gemini-web gemini-2.5-pro Google Gemini web-based (cookie auth)

All keys stored in Vault at secret/codetether/providers/<name>.

Tools

47 built-in tools across file ops (read, write, edit, multiedit, patch, glob, list, tree, file_info, head_tail, diff), code intelligence (lsp, grep, codesearch, advanced_edit), execution (bash, batch, task), web (webfetch, websearch), media (image, voice, podcast, youtube, avatar), planning (ralph, prd, okr, todo_read, todo_write, plan_enter, plan_exit), agent orchestration (agent, swarm_execute, swarm_share, relay_autochat, go, rlm), knowledge (memory, skill, mcp_bridge), and utilities (undo, question, k8s_tool, confirm_edit, confirm_multiedit).

MCP Server

CodeTether exposes 30+ tools via the Model Context Protocol over stdio. This lets AI clients (GitHub Copilot in VS Code, Claude Desktop, etc.) call CodeTether tools directly.

VS Code (Workspace-Level)

Add .vscode/mcp.json to your workspace:

{
  "servers": {
    "codetether": {
      "command": "/home/riley/.cargo/bin/codetether",
      "args": ["mcp", "serve"],
      "env": {
        "RUST_LOG": "error"
      }
    }
  }
}

Claude Desktop

Edit ~/.config/Claude/claude_desktop_config.json:

{
  "mcpServers": {
    "codetether": {
      "command": "/path/to/codetether",
      "args": ["mcp", "serve"],
      "env": { "RUST_LOG": "error" }
    }
  }
}

For remote machines over SSH:

{
  "mcpServers": {
    "codetether": {
      "command": "ssh",
      "args": ["-T", "user@host", "cd /project && RUST_LOG=error /path/to/codetether mcp serve"]
    }
  }
}

Codex CLI

Add to ~/.codex/config.toml:

[mcp_servers.codetether]
command = "/absolute/path/to/codetether"
args = ["mcp", "serve", "/absolute/workspace/path"]

Exposed Tools (30+)

Category Tools
File Ops read, write, edit, multiedit, patch, glob, list, tree, file_info, head_tail, diff
Search grep, codesearch, advanced_edit
Execution bash, batch, task
Code Intelligence lsp (includes diagnostics from eslint, ruff, biome, stylelint)
Web webfetch, websearch
Agent Orchestration agent, swarm_execute, swarm_share, relay_autochat, go, rlm
Planning ralph, prd, okr, todoread, todowrite
Knowledge memory, skill, mcp (bridge to other MCP servers)
Infrastructure k8s_tool
codetether mcp list-tools          # List available MCP tools
codetether mcp list-tools --json   # JSON output
codetether mcp serve               # Start stdio MCP server
codetether mcp serve --bus-url URL # With agent bus integration

A2A Protocol

Dual-transport Agent-to-Agent communication with a shared in-process bus:

  • Worker mode — Connect to the CodeTether platform and process tasks.
  • Server mode — Accept tasks via JSON-RPC (Axum, :4096) and gRPC (Tonic, :50051) simultaneously.
  • Spawn mode — Launch a standalone A2A peer that auto-registers and discovers other peers.
  • Bus mode — In-process pub/sub for zero-latency local agent communication.

Transports

Transport Port Use Case
JSON-RPC (Axum) 4096 REST API, SSE streams, /.well-known/agent.json
gRPC (Tonic) 50051 High-frequency A2A RPCs, streaming
In-Process Bus Local sub-agents, swarm coordination

gRPC RPCs

RPC Description
SendMessage Submit a task/message
SendStreamingMessage Submit with streaming status updates
GetTask Retrieve task by ID
CancelTask Cancel a running task
TaskSubscription Subscribe to status updates (server-stream)
CreateTaskPushNotificationConfig Register push notification endpoint
GetTaskPushNotificationConfig Get push notification config
ListTaskPushNotificationConfig List push configs for a task
DeleteTaskPushNotificationConfig Remove a push notification config
GetAgentCard Retrieve the agent's capability card

Agent Bus Topics

Topic Pattern Semantics
agent.{id} Messages to a specific agent
task.{id} All updates for a task
swarm.{id} Swarm-level coordination
broadcast Global announcements
results.{key} Shared result publication
tools.{name} Tool-specific channels

Cognition APIs

When running codetether serve, perpetual persona swarms with SSE event stream:

Method Endpoint Description
POST /v1/cognition/start Start perpetual cognition loop
POST /v1/cognition/stop Stop cognition loop
GET /v1/cognition/status Runtime status and metrics
GET /v1/cognition/stream SSE stream of thought events
POST /v1/swarm/personas Create a root persona
POST /v1/swarm/personas/{id}/spawn Spawn child persona
POST /v1/swarm/personas/{id}/reap Reap a persona
GET /v1/swarm/lineage Persona lineage graph

Architecture

┌─────────────────────────────────────────────────────────────┐
│                    CodeTether Platform                      │
│                 (A2A Server at api.codetether.run)          │
└───────────────┬───────────────────────┬─────────────────────┘
                │ SSE/JSON-RPC          │ gRPC (A2A proto)
                ▼                       ▼
┌─────────────────────────────────────────────────────────────┐
│                    codetether-agent                         │
│                                                             │
│   ┌───────────────────────────────────────────────────┐     │
│   │              Agent Message Bus                    │     │
│   │   (broadcast pub/sub, topic routing, BusHandle)   │     │
│   └──┬──────────┬──────────┬──────────┬───────────────┘     │
│      │          │          │          │                      │
│   ┌──┴───┐  ┌──┴───┐  ┌──┴───┐  ┌──┴────────┐              │
│   │ A2A  │  │ Swarm│  │ Tool │  │  Provider │              │
│   │Worker│  │ Exec │  │System│  │   Layer   │              │
│   └──┬───┘  └──┬───┘  └──┬───┘  └──┬────────┘              │
│      │         │         │         │                        │
│   ┌──┴─────────┴─────────┴─────────┴──┐                     │
│   │         Agent Registry            │                     │
│   └───────────────────────────────────┘                     │
│                                                             │
│   ┌──────────┐  ┌──────────┐  ┌──────────┐ ┌──────────┐    │
│   │JSON-RPC  │  │ gRPC     │  │ Auth     │ │ Audit    │    │
│   │(Axum)    │  │ (Tonic)  │  │ (Bearer) │ │ (JSONL)  │    │
│   │:4096     │  │ :50051   │  │ Mandatory│ │ Append   │    │
│   └──────────┘  └──────────┘  └──────────┘ └──────────┘    │
│   ┌──────────┐  ┌──────────┐  ┌──────────────────────┐     │
│   │ Sandbox  │  │ K8s Mgr  │  │  HashiCorp Vault     │     │
│   │ (Ed25519)│  │ (Deploy) │  │  (API Keys)          │     │
│   └──────────┘  └──────────┘  └──────────────────────┘     │
└─────────────────────────────────────────────────────────────┘

Configuration

~/.config/codetether-agent/config.toml:

[default]
provider = "anthropic"
model = "claude-sonnet-4-20250514"

[ui]
theme = "marketing"   # marketing, dark, light, solarized-dark, solarized-light

[session]
auto_save = true

[lsp]
[lsp.servers]
# my-ruby-lsp = { command = "ruby-lsp", args = ["--stdio"], file_extensions = ["rb"] }

[lsp.linters]
eslint = { enabled = true }
ruff = { enabled = true }
biome = { enabled = false }
stylelint = { enabled = true }

Environment Variables

Variable Default Description
VAULT_ADDR Vault server address
VAULT_TOKEN Authentication token
VAULT_MOUNT secret KV mount path
VAULT_SECRETS_PATH codetether/providers Provider secrets prefix
CODETETHER_AUTH_TOKEN (auto-generated) Bearer token for API auth
CODETETHER_DATA_DIR .codetether-agent Runtime data directory
CODETETHER_GRPC_PORT 50051 gRPC server port
CODETETHER_A2A_PEERS Comma-separated peer seed URLs

Performance

Metric Value
Startup 13ms
Memory (idle) ~15 MB
Memory (10-agent swarm) ~55 MB
Binary size ~12.5 MB

Written in Rust with tokio — true parallelism, no GC pauses, native performance. See CHANGELOG.md for benchmark details.

Development

cargo build                  # Debug build
cargo build --release        # Release build
cargo test                   # Run tests
cargo clippy --all-features  # Lint
cargo fmt                    # Format

License

MIT