CodeTether Agent
A high-performance AI coding agent written in Rust. A2A (Agent-to-Agent) protocol support with dual JSON-RPC + gRPC transports, in-process agent message bus, rich terminal UI, parallel swarm execution, autonomous PRD-driven development, local FunctionGemma tool-call router, and derived context per turn — the canonical chat history stays append-only while the LLM sees a compressed, paired, and repaired ephemeral context every turn.
Install
Via npx (no Rust required)
Linux / macOS
|
Downloads the codetether binary. FunctionGemma is optional. No Rust toolchain required.
# Also install the FunctionGemma model (~292 MB)
|
Windows (PowerShell)
irm https://raw.githubusercontent.com/rileyseaburg/codetether-agent/main/install.ps1 | iex
# To also install FunctionGemma, run the script with -FunctionGemma
iwr https://raw.githubusercontent.com/rileyseaburg/codetether-agent/main/install.ps1 -OutFile install.ps1
.\install.ps1 -FunctionGemma
From crates.io
# Hardware acceleration for FunctionGemma:
From Source
# Binary at target/release/codetether
# Without FunctionGemma (smaller binary)
Quick Start
1. Configure Vault
All API keys live in HashiCorp Vault — never in config files or env vars.
# Add a provider
2. Launch the TUI
3. Or Run a Single Prompt
CLI
codetether index always generates embeddings locally (no paid API required). Tune with --embedding-model, --embedding-dimensions, --embedding-batch-size, and --embedding-input-chars.
Additional Command Families
# Auth workflows
# MCP client mode
# Oracle maintenance
# Moltbook
Forage: Autonomous OKR-Governed Loop
# Scan for top opportunities
# Autonomous loop
# Autonomous + execute
# Smart swarms in forage loop
# Moonshot rubric: mission statements that bias prioritization
# Strict moonshot gate
Notes:
--executemode auto-seeds a default mission OKR if the repository is empty so the loop can self-start.- Without
--execute, forage only reports existing opportunities. - KR progress is only recorded when quality gates (cargo check, cargo test) pass.
Security
CodeTether treats security as non-optional infrastructure, not a feature flag.
| Control | Implementation |
|---|---|
| Authentication | Mandatory Bearer token on every endpoint (except /health). Cannot be disabled. |
| Audit Trail | Append-only JSON Lines log of every action — queryable by actor, action, resource, time range. |
| Plugin Signing | Ed25519 signatures on tool manifests. SHA-256 content hashing. Unsigned tools rejected. |
| Sandboxing | Resource-limited execution: max memory, max CPU seconds, network allow/deny per tool. |
| Secrets | All API keys stored in HashiCorp Vault — never in config files or environment variables. |
| K8s Self-Healing | Reconciliation loop detects unhealthy pods and triggers rolling restarts. |
Features
Derived Context: Append-Only History + Ephemeral LLM Context
Every turn the agent derives a fresh DerivedContext from the canonical session.messages — the transcript stays append-only and the LLM gets a compressed, paired, and repaired view. This separation means:
- True history —
session.messagesis never rewritten by compression, so/undo,/fork, and session recall see every original turn. - Compression safety — Context-window enforcement runs on a clone, not the source of truth.
- Tool-call pairing repair — Orphaned tool calls get synthetic placeholders so the provider never sees dangling
assistant.tool_callswithout matchingtoolresults. - Policy-driven resets — Lu et al. reset-to-(prompt, summary) when estimated tokens exceed a threshold, via the
DerivePolicy::Resetpolicy. - Mid-session recall — The
session_recalltool recovers details from the canonical history that the compressor may have dropped from the derived context.
The derivation pipeline: clone → compress last oversized message → experimental pairing → adaptive budget cascade → orphan repair → DerivedContext { messages, compressed, origin_len }.
# Environment variables for history persistence (optional)
FunctionGemma Tool Router
Your primary LLM (Claude, GPT-4o, Kimi, etc.) focuses on reasoning. A local model (FunctionGemma, 270M params) handles structured tool-call formatting via Candle inference (~5-50ms on CPU).
- Provider-agnostic — Switch models freely; tool-call behavior stays consistent.
- Zero overhead — If the LLM already returns tool calls, FunctionGemma is never invoked.
- Safe degradation — On any error, the original response is returned unchanged.
| Variable | Default | Description |
|---|---|---|
CODETETHER_TOOL_ROUTER_ENABLED |
false |
Activate the router |
CODETETHER_TOOL_ROUTER_MODEL_PATH |
— | Path to .gguf model |
CODETETHER_TOOL_ROUTER_TOKENIZER_PATH |
— | Path to tokenizer.json |
CODETETHER_TOOL_ROUTER_ARCH |
gemma3 |
Architecture hint |
CODETETHER_TOOL_ROUTER_DEVICE |
auto |
auto / cpu / cuda |
CODETETHER_TOOL_ROUTER_MAX_TOKENS |
512 |
Max decode tokens |
CODETETHER_TOOL_ROUTER_TEMPERATURE |
0.1 |
Sampling temperature |
RLM: Recursive Language Model
Handles content that exceeds model context windows. Loads context into a REPL, lets the LLM explore it with structured tool calls (rlm_head, rlm_tail, rlm_grep, rlm_count, rlm_slice, rlm_llm_query), and returns a synthesized answer via rlm_final.
|
Local CUDA
Content Types
| Type | Detection | Optimization |
|---|---|---|
code |
Function definitions, imports | Semantic chunking by symbols |
logs |
Timestamps, log levels | Time-based chunking |
conversation |
Chat markers, turns | Turn-based chunking |
documents |
Markdown headers, paragraphs | Section-based chunking |
OKR-Driven Execution
CodeTether uses OKRs (Objectives and Key Results) as the bridge between business strategy and autonomous agent execution. Instead of handing agents a task and hoping for the best, you state your intent, approve a plan, and get measurable outcomes.
The /go Lifecycle
┌──────────────────────────────────────────────────────────────────┐
│ /go Lifecycle │
│ │
│ 1. You state intent │
│ └─ "/go audit the bin cleaning system for Q3 readiness" │
│ │
│ 2. System reframes as OKR │
│ └─ Objective + Key Results generated from your prompt │
│ │
│ 3. You approve or deny │
│ └─ TUI: press A (approve) or D (deny) │
│ └─ CLI: y/n prompt │
│ │
│ 4. Autonomous relay execution │
│ └─ Swarms, tools, sequential agent turns │
│ │
│ 5. KR progress updates (per relay turn) │
│ └─ Key Results evaluated and persisted after each turn │
│ │
│ 6. Completion + outcome │
│ └─ Final KR outcomes recorded │
└──────────────────────────────────────────────────────────────────┘
/go vs /autochat
| Command | Purpose | OKR Gate | Best For |
|---|---|---|---|
/go |
Strategic execution | Yes — draft → approve → run | Epics, business goals, tracked outcomes |
/autochat |
Tactical execution | No — runs immediately | Quick tasks, bug fixes |
OKRs naturally support long-running work with persistent state, cumulative KR progress, checkpointed relays for crash recovery, and correlation IDs (okr_id, okr_run_id, relay_id, session_id) across all audit/event entries.
Session Management
The TUI provides first-class session lifecycle commands:
| Command | Purpose |
|---|---|
/ask <question> |
Ask a one-off question without adding to session history |
/undo [N] |
Remove last N user/assistant/tool turns from the session |
/fork [N] |
Create a child session from the current state (optionally at turn N) |
/audit |
Open the audit view to inspect action history |
Sessions remain append-only — /undo and /fork operate on the canonical transcript, while the LLM always receives a freshly derived ephemeral context per turn.
Swarm: Parallel Sub-Agent Execution
Decomposes complex tasks into subtasks and executes them concurrently.
Strategies: auto (default), domain, data, stage, none.
Execution modes:
local(default): sub-agents run as local async tasks.k8s: sub-agents run as isolated Kubernetes pods with deterministic collapse-based pruning/promotion.
Ralph: Autonomous PRD-Driven Development
Give it a spec, watch it work story by story. Each iteration is a fresh agent with full tool access. Memory persists via git history, progress.txt, and the PRD file.
Terminal outcomes: Completed (all stories passed), MaxIterations (partial), QualityFailed (no stories passed gates).
Oracle: Deterministic Validation Utilities
Validate structured answers against source material and sync oracle traces to remote storage.
Moltbook
Moltbook is the social network integration for agents. The CLI supports registration, claim status, profile management, posting, introductions, heartbeat/feed checks, comments, and search.
TUI
Rich terminal UI with model selector, session picker, swarm view, Ralph view, audit view, and theme hot-reload.
Slash Commands: /go, /autochat, /ask, /new, /model, /sessions, /swarm, /ralph, /rlm, /bus, /lsp, /latency, /symbols, /settings, /file, /image, /spawn, /kill, /agents, /undo, /fork, /audit, /autoapply, /network, /mcp connect|servers|tools|call, /import-codex, /keys, /help
Keyboard: Ctrl+M model selector, Ctrl+B toggle layout, Ctrl+S/F2 swarm view, Tab switch agents, Alt+j/k scroll, ? help
Providers
| Provider | Default Model | Notes |
|---|---|---|
zai |
glm-5 |
Z.AI flagship — GLM-5 agentic coding (200K ctx) |
moonshotai |
kimi-k2.5 |
Excellent for coding |
github-copilot |
claude-opus-4 |
GitHub Copilot models |
openai |
gpt-4o |
OpenAI GPT models |
openai-codex |
gpt-5-codex |
ChatGPT subscription OAuth |
openrouter |
stepfun/step-3.5-flash:free |
Access to many models |
google |
gemini-2.5-pro |
Google AI |
anthropic |
claude-sonnet-4-20250514 |
Direct API |
stepfun |
step-3.5-flash |
Chinese reasoning model |
vertex-glm |
zai-org/glm-5-maas |
GLM-5 via Vertex AI (service account JWT) |
vertex-anthropic |
claude-sonnet-4-20250514 |
Claude via GCP Vertex AI |
bedrock |
amazon.nova-lite-v1:0 / us.anthropic.claude-opus-4-6-v1:0 |
Amazon Bedrock Converse API |
local-cuda |
(configurable) | Local CUDA inference via Candle (Qwen, etc.) |
gemini-web |
gemini-2.5-pro |
Google Gemini web-based (cookie auth) |
All keys stored in Vault at secret/codetether/providers/<name>.
Tools
50+ built-in tools include file ops (read, write, edit, multiedit, apply_patch, glob, list, tree, fileinfo, headtail, diff), code intelligence (lsp, grep, codesearch, advanced_edit), execution (bash, batch, task), browser automation (browserctl), web (webfetch, websearch), media (image, voice, podcast, youtube, avatar), planning (ralph, prd, okr, todoread, todowrite, plan_enter, plan_exit), session and safety (context_reset, context_browse, session_recall, session_task, undo, question, confirm_edit, confirm_multiedit), agent orchestration (agent, swarm_execute, swarm_share, relay_autochat, go, rlm), knowledge (memory, skill, mcp), and infrastructure (kubernetes). Provider-backed agent registries also expose the LLM-routed search tool. Compatibility aliases patch, file_info, head_tail, todo_read, todo_write, mcp_bridge, and k8s_tool remain accepted.
MCP Server
CodeTether exposes 30+ tools via the Model Context Protocol over stdio. This lets AI clients (GitHub Copilot in VS Code, Claude Desktop, etc.) call CodeTether tools directly.
VS Code (Workspace-Level)
Add .vscode/mcp.json to your workspace:
Claude Desktop
Edit ~/.config/Claude/claude_desktop_config.json:
For remote machines over SSH:
Codex CLI
Add to ~/.codex/config.toml:
[]
= "/absolute/path/to/codetether"
= ["mcp", "serve", "/absolute/workspace/path"]
Exposed Tools (30+)
| Category | Tools |
|---|---|
| File Ops | read, write, edit, multiedit, apply_patch, glob, list, tree, fileinfo, headtail, diff |
| Search | grep, codesearch, advanced_edit |
| Execution | bash, batch, task |
| Browser | browserctl |
| Code Intelligence | lsp (includes diagnostics from eslint, ruff, biome, stylelint) |
| Web | webfetch, websearch |
| Media | image, voice, podcast, youtube, avatar |
| Session & Safety | context_reset, context_browse, session_recall, session_task, undo, question, confirm_edit, confirm_multiedit |
| Agent Orchestration | agent, swarm_execute, swarm_share, relay_autochat, go, rlm |
| Planning | ralph, prd, okr, todoread, todowrite |
| Knowledge | memory, skill, mcp (bridge to other MCP servers) |
| Infrastructure | kubernetes |
The MCP registry also accepts compatibility aliases patch, file_info, head_tail, todo_read, todo_write, mcp_bridge, and k8s_tool.
A2A Protocol
Dual-transport Agent-to-Agent communication with a shared in-process bus:
- Worker mode — Connect to the CodeTether platform and process tasks.
- Server mode — Accept tasks via JSON-RPC (Axum,
:4096) and gRPC (Tonic,:50051) simultaneously. - Spawn mode — Launch a standalone A2A peer that auto-registers and discovers other peers.
- Bus mode — In-process pub/sub for zero-latency local agent communication.
Transports
| Transport | Port | Use Case |
|---|---|---|
| JSON-RPC (Axum) | 4096 |
REST API, SSE streams, /.well-known/agent.json |
| gRPC (Tonic) | 50051 |
High-frequency A2A RPCs, streaming |
| In-Process Bus | — | Local sub-agents, swarm coordination |
gRPC RPCs
| RPC | Description |
|---|---|
SendMessage |
Submit a task/message |
SendStreamingMessage |
Submit with streaming status updates |
GetTask |
Retrieve task by ID |
CancelTask |
Cancel a running task |
TaskSubscription |
Subscribe to status updates (server-stream) |
CreateTaskPushNotificationConfig |
Register push notification endpoint |
GetTaskPushNotificationConfig |
Get push notification config |
ListTaskPushNotificationConfig |
List push configs for a task |
DeleteTaskPushNotificationConfig |
Remove a push notification config |
GetAgentCard |
Retrieve the agent's capability card |
Agent Bus Topics
| Topic Pattern | Semantics |
|---|---|
agent.{id} |
Messages to a specific agent |
task.{id} |
All updates for a task |
swarm.{id} |
Swarm-level coordination |
broadcast |
Global announcements |
results.{key} |
Shared result publication |
tools.{name} |
Tool-specific channels |
Cognition APIs
When running codetether serve, perpetual persona swarms with SSE event stream:
| Method | Endpoint | Description |
|---|---|---|
POST |
/v1/cognition/start |
Start perpetual cognition loop |
POST |
/v1/cognition/stop |
Stop cognition loop |
GET |
/v1/cognition/status |
Runtime status and metrics |
GET |
/v1/cognition/stream |
SSE stream of thought events |
POST |
/v1/swarm/personas |
Create a root persona |
POST |
/v1/swarm/personas/{id}/spawn |
Spawn child persona |
POST |
/v1/swarm/personas/{id}/reap |
Reap a persona |
GET |
/v1/swarm/lineage |
Persona lineage graph |
Architecture
┌─────────────────────────────────────────────────────────────┐
│ CodeTether Platform │
│ (A2A Server at api.codetether.run) │
└───────────────┬───────────────────────┬─────────────────────┘
│ SSE/JSON-RPC │ gRPC (A2A proto)
▼ ▼
┌─────────────────────────────────────────────────────────────┐
│ codetether-agent │
│ │
│ ┌───────────────────────────────────────────────────┐ │
│ │ Agent Message Bus │ │
│ │ (broadcast pub/sub, topic routing, BusHandle) │ │
│ └──┬──────────┬──────────┬──────────┬───────────────┘ │
│ │ │ │ │ │
┌──┴───┐ ┌──┴───┐ ┌──┴───┐ ┌──┴────────┐ ┌────────┐ │
│ A2A │ │ Swarm│ │ Tool │ │ Provider │ │Derived │ │
│Worker│ │ Exec │ │System│ │ Layer │ │Context │ │
└──┬───┘ └──┬───┘ └──┬───┘ └──┬────────┘ └────────┘ │
│ │ │ │ │ │
│ ┌──┴─────────┴─────────┴─────────┴──┐ │
│ │ Agent Registry │ │
│ └───────────────────────────────────┘ │
│ │
│ ┌──────────┐ ┌──────────┐ ┌──────────┐ ┌──────────┐ │
│ │JSON-RPC │ │ gRPC │ │ Auth │ │ Audit │ │
│ │(Axum) │ │ (Tonic) │ │ (Bearer) │ │ (JSONL) │ │
│ │:4096 │ │ :50051 │ │ Mandatory│ │ Append │ │
│ └──────────┘ └──────────┘ └──────────┘ └──────────┘ │
│ ┌──────────┐ ┌──────────┐ ┌──────────────────────┐ │
│ │ Sandbox │ │ K8s Mgr │ │ HashiCorp Vault │ │
│ │ (Ed25519)│ │ (Deploy) │ │ (API Keys) │ │
│ └──────────┘ └──────────┘ └──────────────────────┘ │
└─────────────────────────────────────────────────────────────┘
Configuration
~/.config/codetether-agent/config.toml:
[]
= "anthropic"
= "claude-sonnet-4-20250514"
[]
= "marketing" # marketing, dark, light, solarized-dark, solarized-light
[]
= true
[]
[]
# my-ruby-lsp = { command = "ruby-lsp", args = ["--stdio"], file_extensions = ["rb"] }
[]
= { = true }
= { = true }
= { = false }
= { = true }
Environment Variables
| Variable | Default | Description |
|---|---|---|
VAULT_ADDR |
— | Vault server address |
VAULT_TOKEN |
— | Authentication token |
VAULT_MOUNT |
secret |
KV mount path |
VAULT_SECRETS_PATH |
codetether/providers |
Provider secrets prefix |
CODETETHER_AUTH_TOKEN |
(auto-generated) | Bearer token for API auth |
CODETETHER_DATA_DIR |
.codetether-agent |
Runtime data directory |
CODETETHER_GRPC_PORT |
50051 |
gRPC server port |
CODETETHER_A2A_PEERS |
— | Comma-separated peer seed URLs |
Performance
| Metric | Value |
|---|---|
| Startup | 13ms |
| Memory (idle) | ~15 MB |
| Memory (10-agent swarm) | ~55 MB |
| Binary size | ~12.5 MB |
Written in Rust with tokio — true parallelism, no GC pauses, native performance. See CHANGELOG.md for benchmark details.
Development
License
MIT