codetether-agent 0.1.9

A2A-native AI coding agent for the CodeTether ecosystem
Documentation
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730
731
732
733
734
735
736
737
738
739
740
741
742
743
744
745
746
747
748
749
750
751
752
753
754
755
756
757
758
759
760
761
762
763
764
765
766
767
768
769
770
771
772
773
774
775
776
777
778
779
780
781
782
783
784
785
786
787
788
789
790
791
792
793
794
795
796
797
798
799
800
801
802
803
804
805
806
807
808
809
810
811
812
813
814
815
816
817
818
819
820
821
822
823
824
825
826
827
828
829
830
831
832
833
834
835
836
837
838
839
840
841
842
843
844
845
846
847
848
849
850
851
852
853
854
855
856
857
858
859
860
861
862
863
864
865
866
867
868
869
870
871
872
873
874
875
876
877
878
879
880
881
882
883
884
885
886
887
888
889
890
891
892
893
894
895
896
897
898
899
900
901
902
903
904
905
906
907
908
909
910
911
912
913
914
915
916
917
918
919
920
921
922
923
924
925
926
927
928
929
930
931
932
933
934
935
936
937
938
939
940
941
942
943
944
945
946
947
948
949
950
# CodeTether Agent

[![Crates.io](https://img.shields.io/crates/v/codetether-agent.svg)](https://crates.io/crates/codetether-agent)
[![License: MIT](https://img.shields.io/badge/License-MIT-yellow.svg)](https://opensource.org/licenses/MIT)
[![GitHub Release](https://img.shields.io/github/v/release/rileyseaburg/codetether-agent)](https://github.com/rileyseaburg/codetether-agent/releases)

**Linux binary (v0.1.5):** [direct]https://github.com/rileyseaburg/codetether-agent/releases/download/v0.1.5/codetether-v0.1.5-x86_64-unknown-linux-gnu | [tar.gz]https://github.com/rileyseaburg/codetether-agent/releases/download/v0.1.5/codetether-v0.1.5-x86_64-unknown-linux-gnu.tar.gz | [SHA256SUMS]https://github.com/rileyseaburg/codetether-agent/releases/download/v0.1.5/SHA256SUMS-v0.1.5.txt

A high-performance AI coding agent with first-class A2A (Agent-to-Agent) protocol support, written in Rust. Features a rich terminal UI with dedicated views for swarm orchestration and autonomous PRD-driven development. Part of the CodeTether ecosystem.

![CodeTether TUI](docs/tui-screenshot.png)

## What's New in v0.1.5

- **Perpetual Persona Swarms (Phase 0)** — Always-on cognition runtime with persona lineage, SSE event stream, and control APIs.
- **Bedrock Provider** — Native Amazon Bedrock Converse API support (including region-aware configuration).
- **Provider Model Discovery** — Added default model catalogs for OpenAI-compatible providers (`cerebras`, `novita`, `minimax`).
- **Worker API Alignment** — Updated worker registration, task, and heartbeat paths to the `/v1/opencode/*` namespace.
- **Model ID Translation Fix** — Preserves model IDs that use `:` for version suffixes (for example `amazon.nova-micro-v1:0`).

See [full release notes](https://github.com/rileyseaburg/codetether-agent/releases/tag/v0.1.5).

## Features

- **A2A-Native**: Built from the ground up for the A2A protocol - works as a worker agent for the CodeTether platform
- **AI-Powered Coding**: Intelligent code assistance using multiple AI providers (OpenAI, Anthropic, Google, Moonshot, GitHub Copilot, etc.)
- **Swarm Execution**: Parallel sub-agent execution with real-time per-agent event streaming and dedicated TUI detail view
- **Ralph Loop**: Autonomous PRD-driven development with dedicated TUI view — give it a spec, watch it work story by story
- **Interactive TUI**: Rich terminal interface with webview layout, model selector, session picker, swarm view, and Ralph view
- **RLM Processing**: Handle context larger than model windows via recursive language model approach
- **Secure Secrets**: All API keys loaded exclusively from HashiCorp Vault - no environment variable secrets
- **27+ Tools**: Comprehensive tool system for file ops, LSP, code search, web fetch, and more
- **Session Management**: Persistent session history with git-aware storage
- **High Performance**: Written in Rust — 13ms startup, <20MB idle memory, true parallelism via tokio

## Installation

### One-Click Install (Recommended)

```bash
curl -fsSL https://raw.githubusercontent.com/rileyseaburg/codetether-agent/main/install.sh | sh
```

No Rust toolchain required. Downloads the latest pre-built binary and installs to `/usr/local/bin` (or `~/.local/bin`).

### From crates.io

```bash
cargo install codetether-agent
```

This installs the `codetether` binary to `~/.cargo/bin/`.

### From GitHub Releases

Download pre-built binaries from [GitHub Releases](https://github.com/rileyseaburg/codetether-agent/releases).

### From Source

```bash
git clone https://github.com/rileyseaburg/codetether-agent
cd codetether-agent
cargo build --release
# Binary at target/release/codetether
```

## Crash Reporting (Opt-In)

CodeTether can automatically capture catastrophic crashes (panic message, location, stack trace, version, OS/arch, and command) and send them to a remote endpoint on next startup.

- Disabled by default.
- On first interactive TUI run, CodeTether asks for explicit consent.
- No source files or API keys are included.
- Reports are queued locally in the data directory under `crash-reports/` before upload.
- Uploads use a versioned schema envelope (`codetether.crash.v1`) with legacy fallback for older endpoints.

Enable:

```bash
codetether config --set telemetry.crash_reporting=true
```

Disable:

```bash
codetether config --set telemetry.crash_reporting=false
```

Set a custom endpoint:

```bash
codetether config --set telemetry.crash_report_endpoint=https://your-endpoint.example.com/crashes
```

If your crash endpoint requires authentication, set one of these environment variables:

```bash
export CODETETHER_CRASH_REPORT_AUTH_TOKEN="your-bearer-token"
# or
export CODETETHER_CRASH_REPORT_API_KEY="your-api-key"
```

## Quick Start

### 1. Configure HashiCorp Vault

All API keys are stored in HashiCorp Vault for security. Set up your Vault connection:

```bash
export VAULT_ADDR="https://vault.example.com:8200"
export VAULT_TOKEN="hvs.your-token"
```

Store your provider API keys in Vault:

```bash
# Moonshot AI (default provider)
vault kv put secret/codetether/providers/moonshotai api_key="sk-..."

# OpenRouter (access to many models)
vault kv put secret/codetether/providers/openrouter api_key="sk-or-v1-..."

# Google AI
vault kv put secret/codetether/providers/google api_key="AIza..."

# Anthropic (or via Azure)
vault kv put secret/codetether/providers/anthropic api_key="sk-ant-..." base_url="https://api.anthropic.com"

# Azure Anthropic
vault kv put secret/codetether/providers/anthropic api_key="..." base_url="https://your-endpoint.azure.com/anthropic/v1"

# StepFun
vault kv put secret/codetether/providers/stepfun api_key="..."

vault kv put secret/codetether/providers/zhipuai api_key="..." base_url="https://api.z.ai/api/paas/v4"
```

### If You See "No providers available"

This means CodeTether can run, but it cannot find any API keys in Vault.

Use this copy/paste checklist:

```bash
# 1) Set Vault connection details (replace with your real values)
export VAULT_ADDR="https://vault.example.com:8200"
export VAULT_TOKEN="hvs.your-token"
export VAULT_MOUNT="secret"
export VAULT_SECRETS_PATH="codetether/providers"

# 2) Add one provider key (example: OpenRouter)
vault kv put secret/codetether/providers/openrouter api_key="sk-or-v1-..."

# 3) Verify the key exists
vault kv list secret/codetether/providers
vault kv get secret/codetether/providers/openrouter

# 4) Test CodeTether
codetether run --model openrouter/stepfun/step-3.5-flash:free "hello"
```

If you are logged in as `root`, do not use `sudo` in install commands.

For worker/service setups, make sure the same `VAULT_*` variables are present in your service environment (for example `/etc/default/codetether-agent`) before restarting.

### Supported Providers

| Provider | Default Model | Notes |
|----------|---------------|-------|
| `moonshotai` | `kimi-k2.5` | **Default** - excellent for coding |
| `github-copilot` | `claude-opus-4` | GitHub Copilot models (Claude, GPT, Gemini) |
| `openrouter` | `stepfun/step-3.5-flash:free` | Access to many models |
| `google` | `gemini-2.5-pro` | Google AI |
| `anthropic` | `claude-sonnet-4-20250514` | Direct or via Azure |
| `stepfun` | `step-3.5-flash` | Chinese reasoning model |

### 2. Connect to CodeTether Platform

```bash
# Connect as a worker to the CodeTether A2A server
codetether worker --server https://api.codetether.run --codebases /path/to/project

# Or with authentication
codetether worker --server https://api.codetether.run --codebases /path/to/project --token your-worker-token

# Or use the one-command deploy script (from repo root)
./deploy-worker.sh --codebases /path/to/project
```

### 3. Or Use Interactive Mode

```bash
# Start the TUI in current directory
codetether tui

# Start in a specific project
codetether tui /path/to/project
```

## CLI Quick Reference

```bash
# Interactive TUI (like opencode)
codetether tui

# Chat mode (no tools)
codetether run "explain this code"

# Swarm mode - parallel sub-agents for complex tasks
codetether swarm "implement feature X with tests"

# Ralph - autonomous PRD-driven development
codetether ralph run --prd prd.json

# Generate a PRD template
codetether ralph create-prd --feature "My Feature" --project-name "my-app"

# Start HTTP server
codetether serve --port 4096

# Show config
codetether config --show
```

## Usage

### Default Mode: A2A Worker

By default, `codetether` runs as an A2A worker that connects to the CodeTether platform:

```bash
# Connect to CodeTether platform
codetether --server https://api.codetether.run

# With custom worker name
codetether --server https://api.codetether.run --name "my-dev-machine"
```

Environment variables:
- `CODETETHER_SERVER` - A2A server URL
- `CODETETHER_TOKEN` - Authentication token
- `CODETETHER_WORKER_NAME` - Worker name

### Interactive TUI

```bash
codetether tui
```

![CodeTether TUI](docs/tui-screenshot.png)

The TUI provides:
- **Webview layout**: Dashboard with sidebar, chat, and inspector (`/webview` or `Ctrl+B`)
- **Model selector**: Browse and pick models at runtime (`/model` or `Ctrl+M`)
- **Swarm view**: `/swarm <task>` with real-time per-agent progress, tool calls, and detail view (`Enter` on a subtask)
- **Ralph view**: `/ralph [prd.json]` with per-story progress, quality gates, and sub-agent activity
- **Session management**: `/sessions` picker, `/resume`, `/new`
- **Real-time tool streaming**: See tool calls as they execute
- **Theme support**: Customizable colors via config with hot-reload

### TUI Slash Commands

| Command | Description |
|---------|-------------|
| `/swarm <task>` | Run task in parallel swarm mode |
| `/ralph [path]` | Start autonomous PRD loop (default: `prd.json`) |
| `/model [name]` | Open model picker or set model directly |
| `/sessions` | Open session picker to resume a previous session |
| `/resume [id]` | Resume most recent or specific session |
| `/new` | Start a fresh session |
| `/webview` | Switch to dashboard layout |
| `/classic` | Switch to single-pane layout |
| `/inspector` | Toggle inspector pane |
| `/refresh` | Refresh workspace and session cache |
| `/view` | Toggle swarm view |

### TUI Keyboard Shortcuts

| Key | Action |
|-----|--------|
| `Ctrl+M` | Open model selector |
| `Ctrl+B` | Toggle webview/classic layout |
| `Ctrl+S` / `F2` | Toggle swarm view |
| `F3` | Toggle inspector pane |
| `Tab` | Switch between build/plan agents |
| `Alt+j/k` | Scroll down/up |
| `Alt+u/d` | Half-page scroll |
| `Ctrl+R` | Search command history |
| `?` | Toggle help overlay |

### Non-Interactive Mode (Chat - No Tools)

```bash
# Run a single prompt (chat only, no file editing tools)
codetether run "explain how this codebase works"

# Continue from last session
codetether run --continue "add tests for the new feature"

# Use a specific model
codetether run --model openrouter/stepfun/step-3.5-flash:free "explain this code"
```

**Note:** `codetether run` is chat-only mode without tools. For coding tasks, use `swarm` or `ralph`.

### HTTP Server

```bash
# Start the API server
codetether serve --port 4096
```

### Configuration Management

```bash
# Show current config
codetether config --show

# Initialize default config
codetether config --init
```

## Configuration

Configuration is stored in `~/.config/codetether-agent/config.toml`:

```toml
[default]
provider = "anthropic"
model = "claude-sonnet-4-20250514"

[a2a]
enabled = true
auto_connect = true

[ui]
theme = "dark"

[session]
auto_save = true
```

**Note:** API keys are NOT stored in config files. They must be stored in HashiCorp Vault.

## HashiCorp Vault Setup

### Vault Secret Structure

```
secret/codetether/providers/
├── openai       → { "api_key": "sk-...", "organization": "org-..." }
├── anthropic    → { "api_key": "sk-ant-..." }
├── google       → { "api_key": "AIza..." }
├── deepseek     → { "api_key": "..." }
└── ...
```

### Environment Variables

| Variable | Description |
|----------|-------------|
| `VAULT_ADDR` | Vault server address (e.g., `https://vault.example.com:8200`) |
| `VAULT_TOKEN` | Vault authentication token |
| `VAULT_MOUNT` | KV secrets engine mount path (default: `secret`) |
| `VAULT_SECRETS_PATH` | Path prefix for provider secrets (default: `codetether/providers`) |
| `CODETETHER_DEFAULT_MODEL` | Default model to use (e.g., `moonshotai/kimi-k2.5`) |
| `CODETETHER_SERVER` | A2A server URL |
| `CODETETHER_TOKEN` | Authentication token |
| `CODETETHER_WORKER_NAME` | Worker name |
| `CODETETHER_COGNITION_ENABLED` | Enable perpetual cognition runtime (`true`/`false`, default: `true`) |
| `CODETETHER_COGNITION_AUTO_START` | Auto-start cognition loop on `serve` startup (default: `true`) |
| `CODETETHER_COGNITION_LOOP_INTERVAL_MS` | Loop interval in milliseconds (default: `2000`) |
| `CODETETHER_COGNITION_MAX_SPAWN_DEPTH` | Max persona lineage depth (default: `4`) |
| `CODETETHER_COGNITION_MAX_BRANCHING_FACTOR` | Max active children per persona (default: `4`) |
| `CODETETHER_COGNITION_MAX_EVENTS` | In-memory event buffer size (default: `2000`) |
| `CODETETHER_COGNITION_MAX_SNAPSHOTS` | In-memory snapshot buffer size (default: `128`) |
| `CODETETHER_COGNITION_THINKER_ENABLED` | Enable model-backed thought generation (`true`/`false`, default: `true`) |
| `CODETETHER_COGNITION_THINKER_BACKEND` | Thinker backend: `openai_compat` or `candle` (default: `openai_compat`) |
| `CODETETHER_COGNITION_THINKER_BASE_URL` | OpenAI-compatible base URL for thinker model (default: `http://127.0.0.1:11434/v1`) |
| `CODETETHER_COGNITION_THINKER_MODEL` | Model id for thought generation (default: `qwen2.5:3b-instruct`) |
| `CODETETHER_COGNITION_THINKER_API_KEY` | Optional API key for thinker endpoint |
| `CODETETHER_COGNITION_THINKER_TEMPERATURE` | Thinker temperature (default: `0.2`) |
| `CODETETHER_COGNITION_THINKER_TOP_P` | Optional thinker top-p |
| `CODETETHER_COGNITION_THINKER_MAX_TOKENS` | Max generated tokens per thought step (default: `256`) |
| `CODETETHER_COGNITION_THINKER_TIMEOUT_MS` | Thinker request timeout in ms (default: `12000`) |
| `CODETETHER_COGNITION_THINKER_CANDLE_MODEL_PATH` | GGUF model path for in-process Candle inference |
| `CODETETHER_COGNITION_THINKER_CANDLE_TOKENIZER_PATH` | `tokenizer.json` path used by Candle backend |
| `CODETETHER_COGNITION_THINKER_CANDLE_ARCH` | Candle model architecture (`llama` or `qwen2`, default: auto from GGUF metadata) |
| `CODETETHER_COGNITION_THINKER_CANDLE_DEVICE` | Candle device selection: `auto`, `cpu`, or `cuda` (default: `auto`) |
| `CODETETHER_COGNITION_THINKER_CANDLE_CUDA_ORDINAL` | CUDA device ordinal when using `cuda` (default: `0`) |
| `CODETETHER_COGNITION_THINKER_CANDLE_REPEAT_PENALTY` | Candle repetition penalty (default: `1.1`) |
| `CODETETHER_COGNITION_THINKER_CANDLE_REPEAT_LAST_N` | Token window for repetition penalty (default: `64`) |
| `CODETETHER_COGNITION_THINKER_CANDLE_SEED` | Base sampling seed for Candle thinker (default: `42`) |

GPU execution requires building with `--features candle-cuda` (or `candle-cudnn`).

### Using Vault Agent

For production, use Vault Agent for automatic token renewal:

```hcl
# vault-agent.hcl
vault {
  address = "https://vault.example.com:8200"
}

auto_auth {
  method "kubernetes" {
    mount_path = "auth/kubernetes"
    config = {
      role = "codetether-agent"
    }
  }

  sink "file" {
    config = {
      path = "/tmp/vault-token"
    }
  }
}
```

## Agents

### Build Agent

Full access to development tools. Can read, write, edit files and execute commands.

### Plan Agent

Read-only access for analysis and exploration. Perfect for understanding codebases before making changes.

### Explore Agent

Specialized for code navigation and discovery.

## Tools

CodeTether Agent includes 27+ tools for comprehensive development automation:

### File Operations
| Tool | Description |
|------|-------------|
| `read_file` | Read file contents |
| `write_file` | Write content to files |
| `list_dir` | List directory contents |
| `glob` | Find files by pattern |
| `edit` | Apply search/replace patches |
| `multiedit` | Batch edits across multiple files |
| `apply_patch` | Apply unified diff patches |

### Code Intelligence
| Tool | Description |
|------|-------------|
| `lsp` | Language Server Protocol operations (definition, references, hover, completion) |
| `grep` | Search file contents with regex |
| `codesearch` | Semantic code search |

### Execution
| Tool | Description |
|------|-------------|
| `bash` | Execute shell commands |
| `batch` | Run multiple tool calls in parallel |
| `task` | Background task execution |

### Web & External
| Tool | Description |
|------|-------------|
| `webfetch` | Fetch web pages with smart extraction |
| `websearch` | Search the web for information |

### Agent Orchestration
| Tool | Description |
|------|-------------|
| `ralph` | Autonomous PRD-driven agent loop |
| `rlm` | Recursive Language Model for large contexts |
| `prd` | Generate and manage PRD documents |
| `plan_enter`/`plan_exit` | Switch to planning mode |
| `question` | Ask clarifying questions |
| `skill` | Execute learned skills |
| `todo_read`/`todo_write` | Track task progress |

## A2A Protocol

CodeTether Agent is built for the A2A (Agent-to-Agent) protocol:

- **Worker Mode** (default): Connect to the CodeTether platform and process tasks
- **Server Mode**: Accept tasks from other agents (`codetether serve`)
- **Client Mode**: Dispatch tasks to other A2A agents

### AgentCard

When running as a server, the agent exposes its capabilities via `/.well-known/agent.json`:

```json
{
  "name": "CodeTether Agent",
  "description": "A2A-native AI coding agent",
  "version": "0.1.0",
  "skills": [
    { "id": "code-generation", "name": "Code Generation" },
    { "id": "code-review", "name": "Code Review" },
    { "id": "debugging", "name": "Debugging" }
  ]
}
```

### Perpetual Persona Swarms API (Phase 0)

When running `codetether serve`, the agent also exposes cognition + swarm control APIs:

| Method | Endpoint | Description |
|--------|----------|-------------|
| `POST` | `/v1/cognition/start` | Start perpetual cognition loop |
| `POST` | `/v1/cognition/stop` | Stop cognition loop |
| `GET` | `/v1/cognition/status` | Runtime status and buffer metrics |
| `GET` | `/v1/cognition/stream` | SSE stream of thought events |
| `GET` | `/v1/cognition/snapshots/latest` | Latest compressed memory snapshot |
| `POST` | `/v1/swarm/personas` | Create a root persona |
| `POST` | `/v1/swarm/personas/{id}/spawn` | Spawn child persona |
| `POST` | `/v1/swarm/personas/{id}/reap` | Reap a persona (optional cascade) |
| `GET` | `/v1/swarm/lineage` | Current persona lineage graph |

`/v1/cognition/start` auto-seeds a default `root-thinker` persona when no personas exist, unless a `seed_persona` is provided.

See `docs/perpetual_persona_swarms.md` for request/response contracts.

### CUDA Build/Deploy Helpers

From `codetether-agent/`:

- `make build-cuda` - Build a CUDA-enabled binary locally.
- `make deploy-spike2-cuda` - Sync source to `spike2`, build with `--features candle-cuda`, install, and restart service.
- `make status-spike2-cuda` - Check service status, active Candle device config, and GPU usage on `spike2`.

## Architecture

```
┌─────────────────────────────────────────────────────────┐
│                   CodeTether Platform                   │
│                  (A2A Server at api.codetether.run)     │
└────────────────────────┬────────────────────────────────┘
                         │ SSE/JSON-RPC
┌─────────────────────────────────────────────────────────┐
│                   codetether-agent                      │
│   ┌─────────┐  ┌─────────┐  ┌─────────┐  ┌─────────┐    │
│   │ A2A     │  │ Agent   │  │ Tool    │  │ Provider│    │
│   │ Worker  │  │ System  │  │ System  │  │ Layer   │    │
│   └────┬────┘  └────┬────┘  └────┬────┘  └────┬────┘    │
│        │            │            │            │         │
│        └────────────┴────────────┴────────────┘         │
│                          │                              │
│   ┌──────────────────────┴──────────────────────────┐   │
│   │              HashiCorp Vault                    │   │
│   │         (API Keys & Secrets)                    │   │
│   └─────────────────────────────────────────────────┘   │
└─────────────────────────────────────────────────────────┘
```

## Swarm: Parallel Sub-Agent Execution

The `swarm` command decomposes complex tasks into parallelizable subtasks and executes them concurrently:

```bash
# Execute a complex task with parallel sub-agents (uses CODETETHER_DEFAULT_MODEL or defaults to moonshotai/kimi-k2.5)
codetether swarm "Implement user authentication with tests and documentation"

# Specify a model explicitly
codetether swarm "Implement feature X" --model moonshotai/kimi-k2.5

# Control parallelism and strategy
codetether swarm "Refactor the API layer" --strategy domain --max-subagents 8

# Generate JSON output
codetether swarm "Analyze codebase" --json
```

### Decomposition Strategies

| Strategy | Description |
|----------|-------------|
| `auto` | LLM-driven automatic decomposition (default) |
| `domain` | Split by domain expertise (frontend, backend, etc.) |
| `data` | Split by data partitions |
| `stage` | Split by pipeline stages (analyze → implement → test) |
| `none` | Execute as single task |

## RLM: Recursive Language Model Processing

The `rlm` command handles large contexts that exceed model context windows using the Recursive Language Model approach:

```bash
# Analyze a large source file
codetether rlm "What are the main functions?" -f src/large_file.rs

# Analyze multiple files
codetether rlm "Find all error handling patterns" -f src/*.rs

# Analyze stdin content
cat logs/*.log | codetether rlm "Summarize the errors" --content -

# JSON output for programmatic use
codetether rlm "List all TODO comments" -f src/**/*.rs --json
```

### How RLM Works

Based on the "Recursive Language Model" paper approach:

1. **Context Loading**: Large content is loaded into a REPL-like environment
2. **LLM Analysis**: The LLM writes code to explore the context (head, tail, grep, etc.)
3. **Sub-LM Calls**: The LLM can call `llm_query()` for semantic sub-questions
4. **FINAL Answer**: After 1-5 iterations, the LLM returns a synthesized answer

### RLM Commands (Internal REPL)

| Command | Description |
|---------|-------------|
| `head(n)` | First n lines of context |
| `tail(n)` | Last n lines of context |
| `grep("pattern")` | Search for regex pattern |
| `count("pattern")` | Count pattern occurrences |
| `llm_query("question")` | Ask semantic sub-question |
| `FINAL("answer")` | Return final answer |

## Ralph: Autonomous PRD-Driven Agent Loop

Ralph is an autonomous agent loop that implements features from a structured PRD (Product Requirements Document). Each iteration is a fresh agent instance with clean context, while memory persists via git history, progress.txt, and the PRD itself.

```bash
# Create a new PRD template
codetether ralph create-prd --feature "User Authentication" --project-name "my-app"

# Run Ralph to implement the PRD (note: -p or --prd is required for custom PRD path)
codetether ralph run --prd prd.json --model "moonshotai/kimi-k2.5" --max-iterations 10

# Or using short flags
codetether ralph run -p my-feature-prd.json -m "moonshotai/kimi-k2.5"

# Check status
codetether ralph status --prd prd.json
```

### How Ralph Works

1. **Load PRD**: Read user stories with acceptance criteria, priorities, and dependencies
2. **Select Story**: Pick the highest-priority incomplete story with satisfied dependencies
3. **Implement**: The AI agent has full tool access to read, write, edit, and execute
4. **Quality Check**: Run all quality checks (cargo check, clippy, test, build)
5. **Mark Complete**: Update PRD with pass/fail status
6. **Repeat**: Continue until all stories pass or max iterations reached

### PRD Structure

```json
{
  "project": "my-app",
  "feature": "User Authentication",
  "branch": "feature/user-auth",
  "quality_checks": {
    "typecheck": "cargo check",
    "lint": "cargo clippy",
    "test": "cargo test",
    "build": "cargo build --release"
  },
  "user_stories": [
    {
      "id": "US-001",
      "title": "Login endpoint",
      "description": "Implement POST /auth/login",
      "acceptance_criteria": ["Returns JWT on success", "Returns 401 on failure"],
      "priority": 1,
      "complexity": 2,
      "depends_on": [],
      "passes": false
    }
  ]
}
```

### Memory Across Iterations

Ralph maintains memory across iterations without context window bloat:

| Memory Source | Purpose |
|---------------|---------|
| **Git history** | Commits from previous iterations show what changed |
| **progress.txt** | Agent writes learnings, blockers, and context |
| **prd.json** | Tracks which stories pass/fail |
| **Quality checks** | Error output guides next iteration |

## Dogfooding: Self-Implementing Agent

This project demonstrates true **dogfooding**—using the agent to build its own features.

### What We Accomplished

Using `ralph` and `swarm`, the agent autonomously implemented:

**LSP Client Implementation (10 stories)**:
- US-001: LSP Transport Layer - stdio implementation
- US-002: JSON-RPC Message Framework
- US-003: LSP Initialize Handshake
- US-004: Text Document Synchronization - didOpen
- US-005: Text Document Synchronization - didChange
- US-006: Text Document Completion
- US-007: Text Document Hover
- US-008: Text Document Definition
- US-009: LSP Shutdown and Exit
- US-010: LSP Client Configuration and Server Management

**Missing Features (10 stories)**:
- MF-001: External Directory Tool
- MF-002: RLM Pool - Connection Pooling
- MF-003: Truncation Utilities
- MF-004: LSP Full Integration - Server Management
- MF-005: LSP Transport - stdio Communication
- MF-006: LSP Requests - textDocument/definition
- MF-007: LSP Requests - textDocument/references
- MF-008: LSP Requests - textDocument/hover
- MF-009: LSP Requests - textDocument/completion
- MF-010: RLM Router Enhancement

### Results

| Metric | Value |
|--------|-------|
| **Total User Stories** | 20 |
| **Stories Passed** | 20 (100%) |
| **Total Iterations** | 20 |
| **Quality Checks Per Story** | 4 (check, clippy, test, build) |
| **Lines of Code Generated** | ~6,000+ |
| **Time to Complete** | ~30 minutes |
| **Model Used** | Kimi K2.5 (Moonshot AI) |

### Efficiency Comparison

| Approach | Time | Cost | Notes |
|----------|------|------|-------|
| **Manual Development** | 80 hours | $8,000 | Senior dev @ $100/hr, 50-100 LOC/day |
| **opencode + subagents** | 100 min | ~$11.25 | Bun runtime, Kimi K2.5 (same model) |
| **codetether swarm** | 29.5 min | $3.75 | Native Rust, Kimi K2.5 |

**vs Manual**: 163x faster, 2133x cheaper
**vs opencode**: 3.4x faster, ~3x cheaper (same Kimi K2.5 model)

Key advantages over opencode subagents (model parity):
- Native Rust binary (13ms startup vs 25-50ms Bun)
- Direct API calls vs TypeScript HTTP overhead
- PRD-driven state in files vs subagent process spawning
- ~3x fewer tokens due to reduced subagent initialization overhead

**Note**: Both have LLM-based compaction. The efficiency gain comes from PRD-driven architecture (state in prd.json + progress.txt) vs. spawning subprocesses with rebuilt context.

### How to Replicate

```bash
# 1. Create a PRD for your feature
cat > prd.json << 'EOF'
{
  "project": "my-project",
  "feature": "My Feature",
  "quality_checks": {
    "typecheck": "cargo check",
    "test": "cargo test",
    "lint": "cargo clippy",
    "build": "cargo build --release"
  },
  "user_stories": [
    {
      "id": "US-001",
      "title": "First Story",
      "description": "Implement the first piece",
      "acceptance_criteria": ["Compiles", "Tests pass"],
      "priority": 1,
      "depends_on": [],
      "passes": false
    }
  ]
}
EOF

# 2. Run Ralph
codetether ralph run -p prd.json -m "kimi-k2.5" --max-iterations 10

# 3. Watch as your feature gets implemented autonomously
```

### Why This Matters

1. **Proof of Capability**: The agent can implement non-trivial features end-to-end
2. **Quality Assurance**: Every story passes cargo check, clippy, test, and build
3. **Autonomous Operation**: No human intervention during implementation
4. **Reproducible Process**: PRD-driven development is structured and repeatable
5. **Self-Improvement**: The agent literally improved itself

### Content Types

RLM auto-detects content type for optimized processing:

| Type | Detection | Optimization |
|------|-----------|--------------|
| `code` | Function definitions, imports | Semantic chunking by symbols |
| `logs` | Timestamps, log levels | Time-based chunking |
| `conversation` | Chat markers, turns | Turn-based chunking |
| `documents` | Markdown headers, paragraphs | Section-based chunking |

### Example Output

```bash
$ codetether rlm "What are the 3 main functions?" -f src/chunker.rs --json
{
  "answer": "The 3 main functions are: 1) chunk_content() - splits content...",
  "iterations": 1,
  "sub_queries": 0,
  "stats": {
    "input_tokens": 322,
    "output_tokens": 235,
    "elapsed_ms": 10982
  }
}
```

## Performance: Why Rust Over Bun/TypeScript

CodeTether Agent is written in Rust for measurable performance advantages over JavaScript/TypeScript runtimes like Bun:

### Benchmark Results

| Metric | CodeTether (Rust) | opencode (Bun) | Advantage |
|--------|-------------------|----------------|-----------|
| **Binary Size** | 12.5 MB | ~90 MB (bun + deps) | **7.2x smaller** |
| **Startup Time** | 13 ms | 25-50 ms | **2-4x faster** |
| **Memory (idle)** | ~15 MB | ~50-80 MB | **3-5x less** |
| **Memory (swarm, 10 agents)** | ~45 MB | ~200+ MB | **4-5x less** |
| **Process Spawn** | 1.5 ms | 5-10 ms | **3-7x faster** |
| **Cold Start (container)** | ~50 ms | ~200-500 ms | **4-10x faster** |

### Why This Matters for Sub-Agents

1. **Lower Memory Per Agent**: With 3-5x less memory per agent, you can run more concurrent sub-agents on the same hardware. A 4GB container can run ~80 Rust sub-agents vs ~15-20 Bun sub-agents.

2. **Faster Spawn Time**: Sub-agents spawn in 1.5ms vs 5-10ms. For a swarm of 100 agents, that's 150ms vs 500-1000ms just in spawn overhead.

3. **No GC Pauses**: Rust has no garbage collector. JavaScript/Bun has GC pauses that can add latency spikes of 10-50ms during high-memory operations.

4. **True Parallelism**: Rust's tokio runtime uses OS threads with work-stealing. Bun uses a single-threaded event loop that can bottleneck on CPU-bound decomposition.

5. **Smaller Attack Surface**: Smaller binary = fewer dependencies = smaller CVE surface. Critical for agents with shell access.

### Resource Efficiency for Swarm Workloads

```
┌─────────────────────────────────────────────────────────────────┐
│                    Memory Usage Comparison                      │
│                                                                 │
│  Sub-Agents    CodeTether (Rust)       opencode (Bun)           │
│  ────────────────────────────────────────────────────────────── │
│       1            15 MB                   60 MB                │
│       5            35 MB                  150 MB                │
│      10            55 MB                  280 MB                │
│      25           105 MB                  650 MB                │
│      50           180 MB                 1200 MB                │
│     100           330 MB                 2400 MB                │
│                                                                 │
│  At 100 sub-agents: Rust uses 7.3x less memory                  │
└─────────────────────────────────────────────────────────────────┘
```

### Real-World Impact

For a typical swarm task (e.g., "Implement feature X with tests"):

| Scenario | CodeTether | opencode (Bun) |
|----------|------------|----------------|
| Task decomposition | 50ms | 150ms |
| Spawn 5 sub-agents | 8ms | 35ms |
| Peak memory | 45 MB | 180 MB |
| Total overhead | ~60ms | ~200ms |

**Result**: 3.3x faster task initialization, 4x less memory, more capacity for actual AI inference.

### Measured: Dogfooding Task (20 User Stories)

Actual resource usage from implementing 20 user stories autonomously:

```
┌─────────────────────────────────────────────────────────────────┐
│           Dogfooding Task: 20 Stories, Same Model (Kimi K2.5)   │
│                                                                 │
│  Metric              CodeTether           opencode (estimated)  │
│  ────────────────────────────────────────────────────────────── │
│  Total Time          29.5 min             100 min (3.4x slower) │
│  Wall Clock          1,770 sec            6,000 sec             │
│  Iterations          20                   20                    │
│  Spawn Overhead      20 × 1.5ms = 30ms    20 × 7.5ms = 150ms    │
│  Startup Overhead    20 × 13ms = 260ms    20 × 37ms = 740ms     │
│  Peak Memory         ~55 MB               ~280 MB               │
│  Tokens Used         500K                 ~1.5M (subagent init) │
│  Token Cost          $3.75                ~$11.25               │
│                                                                 │
│  Total Overhead      290ms                890ms (3.1x more)     │
│  Memory Efficiency   5.1x less peak RAM                         │
│  Cost Efficiency     ~3x cheaper                                │
└─────────────────────────────────────────────────────────────────┘
```

**Computation Notes**:
- Spawn overhead: `iterations × spawn_time` (1.5ms Rust vs 7.5ms Bun avg)
- Startup overhead: `iterations × startup_time` (13ms Rust vs 37ms Bun avg)
- Token difference: opencode has compaction, but subagent spawns rebuild system prompt + context each time (~3x more tokens)
- Memory: Based on 10-agent swarm profile (55 MB vs 280 MB)
- Cost: Same Kimi K2.5 pricing, difference is from subagent initialization overhead

**Note**: opencode uses LLM-based compaction for long sessions (similar to codetether). The token difference comes from subagent process spawning overhead, not lack of context management.

### Benchmark Methodology

Run benchmarks yourself:

```bash
./script/benchmark.sh
```

Benchmarks performed on:
- Ubuntu 24.04, x86_64
- 48 CPU threads, 32GB RAM
- Rust 1.85, Bun 1.x
- HashiCorp Vault for secrets

## Development

```bash
# Run in development mode
cargo run -- --server http://localhost:8080

# Run tests
cargo test

# Build release binary
cargo build --release

# Run benchmarks
./script/benchmark.sh
```

## License

MIT