CodeTether Agent
A high-performance AI coding agent written in Rust. A2A (Agent-to-Agent) protocol support with dual JSON-RPC + gRPC transports, in-process agent message bus, rich terminal UI, parallel swarm execution, autonomous PRD-driven development, and a local FunctionGemma tool-call router that separates reasoning from formatting.
Install
Via npx (no Rust required)
Linux / macOS
|
Downloads the binary and the FunctionGemma model (~292 MB) for local tool-call routing. No Rust toolchain required.
# Skip FunctionGemma model download
|
Windows (PowerShell)
irm https://raw.githubusercontent.com/rileyseaburg/codetether-agent/main/install.ps1 | iex
# Skip FunctionGemma model
.\install.ps1 -NoFunctionGemma
From crates.io
# Hardware acceleration for FunctionGemma:
From Source
# Binary at target/release/codetether
# Without FunctionGemma (smaller binary)
Quick Start
1. Configure Vault
All API keys live in HashiCorp Vault — never in config files or env vars.
# Add a provider
2. Launch the TUI
3. Or Run a Single Prompt
CLI
codetether index always generates embeddings locally (no paid API required). Tune with --embedding-model, --embedding-dimensions, --embedding-batch-size, and --embedding-input-chars.
Forage: Autonomous OKR-Governed Loop
# Scan for top opportunities
# Autonomous loop
# Autonomous + execute
# Smart swarms in forage loop
# Moonshot rubric: mission statements that bias prioritization
# Strict moonshot gate
Notes:
--executemode auto-seeds a default mission OKR if the repository is empty so the loop can self-start.- Without
--execute, forage only reports existing opportunities. - KR progress is only recorded when quality gates (cargo check, cargo test) pass.
Security
CodeTether treats security as non-optional infrastructure, not a feature flag.
| Control | Implementation |
|---|---|
| Authentication | Mandatory Bearer token on every endpoint (except /health). Cannot be disabled. |
| Audit Trail | Append-only JSON Lines log of every action — queryable by actor, action, resource, time range. |
| Plugin Signing | Ed25519 signatures on tool manifests. SHA-256 content hashing. Unsigned tools rejected. |
| Sandboxing | Resource-limited execution: max memory, max CPU seconds, network allow/deny per tool. |
| Secrets | All API keys stored in HashiCorp Vault — never in config files or environment variables. |
| K8s Self-Healing | Reconciliation loop detects unhealthy pods and triggers rolling restarts. |
Features
FunctionGemma Tool Router
Your primary LLM (Claude, GPT-4o, Kimi, etc.) focuses on reasoning. A local model (FunctionGemma, 270M params) handles structured tool-call formatting via Candle inference (~5-50ms on CPU).
- Provider-agnostic — Switch models freely; tool-call behavior stays consistent.
- Zero overhead — If the LLM already returns tool calls, FunctionGemma is never invoked.
- Safe degradation — On any error, the original response is returned unchanged.
| Variable | Default | Description |
|---|---|---|
CODETETHER_TOOL_ROUTER_ENABLED |
false |
Activate the router |
CODETETHER_TOOL_ROUTER_MODEL_PATH |
— | Path to .gguf model |
CODETETHER_TOOL_ROUTER_TOKENIZER_PATH |
— | Path to tokenizer.json |
CODETETHER_TOOL_ROUTER_ARCH |
gemma3 |
Architecture hint |
CODETETHER_TOOL_ROUTER_DEVICE |
auto |
auto / cpu / cuda |
CODETETHER_TOOL_ROUTER_MAX_TOKENS |
512 |
Max decode tokens |
CODETETHER_TOOL_ROUTER_TEMPERATURE |
0.1 |
Sampling temperature |
RLM: Recursive Language Model
Handles content that exceeds model context windows. Loads context into a REPL, lets the LLM explore it with structured tool calls (rlm_head, rlm_tail, rlm_grep, rlm_count, rlm_slice, rlm_llm_query), and returns a synthesized answer via rlm_final.
|
Local CUDA
Content Types
| Type | Detection | Optimization |
|---|---|---|
code |
Function definitions, imports | Semantic chunking by symbols |
logs |
Timestamps, log levels | Time-based chunking |
conversation |
Chat markers, turns | Turn-based chunking |
documents |
Markdown headers, paragraphs | Section-based chunking |
OKR-Driven Execution
CodeTether uses OKRs (Objectives and Key Results) as the bridge between business strategy and autonomous agent execution. Instead of handing agents a task and hoping for the best, you state your intent, approve a plan, and get measurable outcomes.
The /go Lifecycle
┌──────────────────────────────────────────────────────────────────┐
│ /go Lifecycle │
│ │
│ 1. You state intent │
│ └─ "/go audit the bin cleaning system for Q3 readiness" │
│ │
│ 2. System reframes as OKR │
│ └─ Objective + Key Results generated from your prompt │
│ │
│ 3. You approve or deny │
│ └─ TUI: press A (approve) or D (deny) │
│ └─ CLI: y/n prompt │
│ │
│ 4. Autonomous relay execution │
│ └─ Swarms, tools, sequential agent turns │
│ │
│ 5. KR progress updates (per relay turn) │
│ └─ Key Results evaluated and persisted after each turn │
│ │
│ 6. Completion + outcome │
│ └─ Final KR outcomes recorded │
└──────────────────────────────────────────────────────────────────┘
/go vs /autochat
| Command | Purpose | OKR Gate | Best For |
|---|---|---|---|
/go |
Strategic execution | Yes — draft → approve → run | Epics, business goals, tracked outcomes |
/autochat |
Tactical execution | No — runs immediately | Quick tasks, bug fixes |
OKRs naturally support long-running work with persistent state, cumulative KR progress, checkpointed relays for crash recovery, and correlation IDs (okr_id, okr_run_id, relay_id, session_id) across all audit/event entries.
Swarm: Parallel Sub-Agent Execution
Decomposes complex tasks into subtasks and executes them concurrently.
Strategies: auto (default), domain, data, stage, none.
Execution modes:
local(default): sub-agents run as local async tasks.k8s: sub-agents run as isolated Kubernetes pods with deterministic collapse-based pruning/promotion.
Ralph: Autonomous PRD-Driven Development
Give it a spec, watch it work story by story. Each iteration is a fresh agent with full tool access. Memory persists via git history, progress.txt, and the PRD file.
Terminal outcomes: Completed (all stories passed), MaxIterations (partial), QualityFailed (no stories passed gates).
TUI
Rich terminal UI with model selector, session picker, swarm view, Ralph view, and theme hot-reload.
Slash Commands: /go, /autochat, /new, /model, /sessions, /swarm, /ralph, /rlm, /bus, /lsp, /latency, /symbols, /settings, /file, /image, /spawn, /kill, /agents, /undo, /autoapply, /network, /mcp connect|servers|tools|call, /import-codex, /keys, /help
Keyboard: Ctrl+M model selector, Ctrl+B toggle layout, Ctrl+S/F2 swarm view, Tab switch agents, Alt+j/k scroll, ? help
Providers
| Provider | Default Model | Notes |
|---|---|---|
zai |
glm-5 |
Z.AI flagship — GLM-5 agentic coding (200K ctx) |
moonshotai |
kimi-k2.5 |
Excellent for coding |
github-copilot |
claude-opus-4 |
GitHub Copilot models |
openai |
gpt-4o |
OpenAI GPT models |
openai-codex |
gpt-5-codex |
ChatGPT subscription OAuth |
openrouter |
stepfun/step-3.5-flash:free |
Access to many models |
google |
gemini-2.5-pro |
Google AI |
anthropic |
claude-sonnet-4-20250514 |
Direct API |
stepfun |
step-3.5-flash |
Chinese reasoning model |
vertex-glm |
zai-org/glm-5-maas |
GLM-5 via Vertex AI (service account JWT) |
vertex-anthropic |
claude-sonnet-4-20250514 |
Claude via GCP Vertex AI |
bedrock |
amazon.nova-lite-v1:0 / us.anthropic.claude-opus-4-6-v1:0 |
Amazon Bedrock Converse API |
local-cuda |
(configurable) | Local CUDA inference via Candle (Qwen, etc.) |
gemini-web |
gemini-2.5-pro |
Google Gemini web-based (cookie auth) |
All keys stored in Vault at secret/codetether/providers/<name>.
Tools
47 built-in tools across file ops (read, write, edit, multiedit, patch, glob, list, tree, file_info, head_tail, diff), code intelligence (lsp, grep, codesearch, advanced_edit), execution (bash, batch, task), web (webfetch, websearch), media (image, voice, podcast, youtube, avatar), planning (ralph, prd, okr, todo_read, todo_write, plan_enter, plan_exit), agent orchestration (agent, swarm_execute, swarm_share, relay_autochat, go, rlm), knowledge (memory, skill, mcp_bridge), and utilities (undo, question, k8s_tool, confirm_edit, confirm_multiedit).
MCP Server
CodeTether exposes 30+ tools via the Model Context Protocol over stdio. This lets AI clients (GitHub Copilot in VS Code, Claude Desktop, etc.) call CodeTether tools directly.
VS Code (Workspace-Level)
Add .vscode/mcp.json to your workspace:
Claude Desktop
Edit ~/.config/Claude/claude_desktop_config.json:
For remote machines over SSH:
Codex CLI
Add to ~/.codex/config.toml:
[]
= "/absolute/path/to/codetether"
= ["mcp", "serve", "/absolute/workspace/path"]
Exposed Tools (30+)
| Category | Tools |
|---|---|
| File Ops | read, write, edit, multiedit, patch, glob, list, tree, file_info, head_tail, diff |
| Search | grep, codesearch, advanced_edit |
| Execution | bash, batch, task |
| Code Intelligence | lsp (includes diagnostics from eslint, ruff, biome, stylelint) |
| Web | webfetch, websearch |
| Agent Orchestration | agent, swarm_execute, swarm_share, relay_autochat, go, rlm |
| Planning | ralph, prd, okr, todoread, todowrite |
| Knowledge | memory, skill, mcp (bridge to other MCP servers) |
| Infrastructure | k8s_tool |
A2A Protocol
Dual-transport Agent-to-Agent communication with a shared in-process bus:
- Worker mode — Connect to the CodeTether platform and process tasks.
- Server mode — Accept tasks via JSON-RPC (Axum,
:4096) and gRPC (Tonic,:50051) simultaneously. - Spawn mode — Launch a standalone A2A peer that auto-registers and discovers other peers.
- Bus mode — In-process pub/sub for zero-latency local agent communication.
Transports
| Transport | Port | Use Case |
|---|---|---|
| JSON-RPC (Axum) | 4096 |
REST API, SSE streams, /.well-known/agent.json |
| gRPC (Tonic) | 50051 |
High-frequency A2A RPCs, streaming |
| In-Process Bus | — | Local sub-agents, swarm coordination |
gRPC RPCs
| RPC | Description |
|---|---|
SendMessage |
Submit a task/message |
SendStreamingMessage |
Submit with streaming status updates |
GetTask |
Retrieve task by ID |
CancelTask |
Cancel a running task |
TaskSubscription |
Subscribe to status updates (server-stream) |
CreateTaskPushNotificationConfig |
Register push notification endpoint |
GetTaskPushNotificationConfig |
Get push notification config |
ListTaskPushNotificationConfig |
List push configs for a task |
DeleteTaskPushNotificationConfig |
Remove a push notification config |
GetAgentCard |
Retrieve the agent's capability card |
Agent Bus Topics
| Topic Pattern | Semantics |
|---|---|
agent.{id} |
Messages to a specific agent |
task.{id} |
All updates for a task |
swarm.{id} |
Swarm-level coordination |
broadcast |
Global announcements |
results.{key} |
Shared result publication |
tools.{name} |
Tool-specific channels |
Cognition APIs
When running codetether serve, perpetual persona swarms with SSE event stream:
| Method | Endpoint | Description |
|---|---|---|
POST |
/v1/cognition/start |
Start perpetual cognition loop |
POST |
/v1/cognition/stop |
Stop cognition loop |
GET |
/v1/cognition/status |
Runtime status and metrics |
GET |
/v1/cognition/stream |
SSE stream of thought events |
POST |
/v1/swarm/personas |
Create a root persona |
POST |
/v1/swarm/personas/{id}/spawn |
Spawn child persona |
POST |
/v1/swarm/personas/{id}/reap |
Reap a persona |
GET |
/v1/swarm/lineage |
Persona lineage graph |
Architecture
┌─────────────────────────────────────────────────────────────┐
│ CodeTether Platform │
│ (A2A Server at api.codetether.run) │
└───────────────┬───────────────────────┬─────────────────────┘
│ SSE/JSON-RPC │ gRPC (A2A proto)
▼ ▼
┌─────────────────────────────────────────────────────────────┐
│ codetether-agent │
│ │
│ ┌───────────────────────────────────────────────────┐ │
│ │ Agent Message Bus │ │
│ │ (broadcast pub/sub, topic routing, BusHandle) │ │
│ └──┬──────────┬──────────┬──────────┬───────────────┘ │
│ │ │ │ │ │
│ ┌──┴───┐ ┌──┴───┐ ┌──┴───┐ ┌──┴────────┐ │
│ │ A2A │ │ Swarm│ │ Tool │ │ Provider │ │
│ │Worker│ │ Exec │ │System│ │ Layer │ │
│ └──┬───┘ └──┬───┘ └──┬───┘ └──┬────────┘ │
│ │ │ │ │ │
│ ┌──┴─────────┴─────────┴─────────┴──┐ │
│ │ Agent Registry │ │
│ └───────────────────────────────────┘ │
│ │
│ ┌──────────┐ ┌──────────┐ ┌──────────┐ ┌──────────┐ │
│ │JSON-RPC │ │ gRPC │ │ Auth │ │ Audit │ │
│ │(Axum) │ │ (Tonic) │ │ (Bearer) │ │ (JSONL) │ │
│ │:4096 │ │ :50051 │ │ Mandatory│ │ Append │ │
│ └──────────┘ └──────────┘ └──────────┘ └──────────┘ │
│ ┌──────────┐ ┌──────────┐ ┌──────────────────────┐ │
│ │ Sandbox │ │ K8s Mgr │ │ HashiCorp Vault │ │
│ │ (Ed25519)│ │ (Deploy) │ │ (API Keys) │ │
│ └──────────┘ └──────────┘ └──────────────────────┘ │
└─────────────────────────────────────────────────────────────┘
Configuration
~/.config/codetether-agent/config.toml:
[]
= "anthropic"
= "claude-sonnet-4-20250514"
[]
= "marketing" # marketing, dark, light, solarized-dark, solarized-light
[]
= true
[]
[]
# my-ruby-lsp = { command = "ruby-lsp", args = ["--stdio"], file_extensions = ["rb"] }
[]
= { = true }
= { = true }
= { = false }
= { = true }
Environment Variables
| Variable | Default | Description |
|---|---|---|
VAULT_ADDR |
— | Vault server address |
VAULT_TOKEN |
— | Authentication token |
VAULT_MOUNT |
secret |
KV mount path |
VAULT_SECRETS_PATH |
codetether/providers |
Provider secrets prefix |
CODETETHER_AUTH_TOKEN |
(auto-generated) | Bearer token for API auth |
CODETETHER_DATA_DIR |
.codetether-agent |
Runtime data directory |
CODETETHER_GRPC_PORT |
50051 |
gRPC server port |
CODETETHER_A2A_PEERS |
— | Comma-separated peer seed URLs |
Performance
| Metric | Value |
|---|---|
| Startup | 13ms |
| Memory (idle) | ~15 MB |
| Memory (10-agent swarm) | ~55 MB |
| Binary size | ~12.5 MB |
Written in Rust with tokio — true parallelism, no GC pauses, native performance. See CHANGELOG.md for benchmark details.
Development
License
MIT