battlecommand-forge 0.2.0

Quality-first AI coding army: single Rust binary that generates production-grade projects via a 9-stage TDD pipeline with a complexity-scaled quality gate (up to 9.2/10)
Documentation
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730
731
732
733
734
735
736
737
738
739
740
741
742
743
744
745
746
747
748
749
750
751
752
753
754
755
756
757
758
759
760
761
762
763
764
765
766
767
768
769
770
771
772
773
774
775
776
777
778
779
780
781
782
783
784
785
786
787
788
789
790
791
792
793
794
795
796
797
798
799
800
801
802
803
804
805
806
807
808
809
810
811
812
813
814
815
816
817
818
819
820
821
822
823
824
825
826
827
828
829
830
831
832
833
834
835
836
837
838
839
840
841
842
843
844
845
846
847
848
849
850
851
852
853
854
855
856
857
858
859
860
861
862
863
864
865
866
867
868
869
870
871
872
873
874
875
876
877
878
879
880
881
882
883
884
885
886
887
888
889
890
891
892
893
894
895
896
897
898
899
900
901
902
903
904
905
906
907
908
909
910
911
912
913
914
915
916
917
918
919
920
921
922
923
924
925
926
927
928
929
930
931
932
933
934
935
936
937
938
939
940
941
942
943
944
945
946
947
948
949
950
951
# BattleCommand Forge

[![CI](https://github.com/mrdushidush/battle-command-forge/actions/workflows/ci.yml/badge.svg)](https://github.com/mrdushidush/battle-command-forge/actions/workflows/ci.yml)
[![Crates.io](https://img.shields.io/crates/v/battlecommand-forge.svg)](https://crates.io/crates/battlecommand-forge)
[![License](https://img.shields.io/badge/license-Apache--2.0-blue.svg)](LICENSE)
[![MSRV](https://img.shields.io/badge/rustc-1.95%2B-orange.svg)](Cargo.toml)

> **Status: stable port at v0.1.0 (April 2026).** Battle-command-forge is the quality-first code-generation branch of an AI-agent project family I've been building since January 2026. This release is a public port of internal pipeline work — the code is field-tested (86/86 unit tests, plus the 10-mission stress suite documented below) but active feature development has slowed while I focus on sibling projects in the same family. Issues and PRs remain welcome. Related: **[claudette](https://github.com/mrdushidush/claudette)** (local-first personal assistant, on crates.io) · **[ABCC](https://github.com/mrdushidush/agent-battle-command-center)** (the godfather — RTS-style TUI for agent orchestration).

**Quality-first AI coding army.** A pure Rust single binary (~3.7 MB) that generates production-grade code using a 9-stage quality pipeline with TDD enforcement, multi-file extraction, surgical fix-pass retry, and a complexity-scaled quality gate.

Give it a mission. Get back a tested, reviewed, production-ready project.

```
$ battlecommand-forge mission "Build a FastAPI CRUD API for managing books with SQLite" --auto

[ROUTER]    Complexity: C4 (moderate) — CRUD + SQLite + FastAPI
[ARCHITECT] Spec: 6 files, 77 lines, TDD plan with 12 test cases
[TESTER]    24 tests written (pytest-asyncio, Pydantic v2 fixtures)
[CODER]     Generating 6 files (single-shot, 80B model)...
[VERIFIER]  venv created, deps installed, ruff clean, 22/24 tests pass
[SECURITY]  OWASP review: no critical issues, score 8.5/10
[CRITIQUE]  DEV=9.0 ARCH=8.5 TEST=8.0 SEC=8.5 DOCS=7.5 → avg 8.3
[CTO]       Approved — coherent, well-structured, ships as-is
[GATE]      Score: 9.1/10 (threshold: 9.2) — FIX ROUND 1
[FIX]       Surgical fix: 1 file (models.py — dual Base bug)
[VERIFIER]  24/24 tests pass
[GATE]      Score: 9.4/10 — SHIPPED ✅

Output: output/fastapi_books_crud/
```

---

## Table of Contents

- [30-Second Quick Start](#30-second-quick-start)
- [Installation](#installation)
- [Your First Mission](#your-first-mission)
- [CLI Commands](#cli-commands)
- [Interactive TUI](#interactive-tui)
- [Presets](#presets)
- [Configuration](#configuration)
- [Environment Variables](#environment-variables)
- [Common Workflows](#common-workflows)
- [How the Pipeline Works](#how-the-pipeline-works)
- [Troubleshooting](#troubleshooting)
- [Architecture](#architecture)

---

## 30-Second Quick Start

```bash
# 1. Build
cargo build --release

# 2. Start Ollama (if not running)
ollama serve &

# 3. Pull a model (7B for fast start — upgrade later)
ollama pull qwen2.5-coder:7b

# 4. Run your first mission
./target/release/battlecommand-forge mission "Create a Python CLI that converts CSV to JSON" --preset fast --auto
```

That's it. Your generated project is in `output/`.

---

## Installation

### Prerequisites

| Requirement | Version | Why |
|-------------|---------|-----|
| **Rust** | 1.91+ | Building the binary |
| **Ollama** | Latest | Running local models |
| **Python** | 3.10+ | Generated code + verifier (creates venvs) |

### Build from Source

```bash
git clone https://github.com/mrdushidush/battle-command-forge.git
cd battle-command-forge
cargo build --release
```

The binary is at `./target/release/battlecommand-forge` (~3.7 MB with LTO + strip).

**Optional:** Copy it to your PATH:

```bash
cp target/release/battlecommand-forge /usr/local/bin/bcf
```

Now you can use `bcf` from anywhere.

### Pull Models

The preset you choose determines which models you need:

```bash
# Fast preset ($0, needs 8GB RAM)
ollama pull qwen2.5-coder:7b

# Balanced preset ($0, needs 20GB RAM)
ollama pull qwen2.5-coder:32b

# Premium preset (best quality, needs 48GB+ VRAM for local models)
ollama pull qwen2.5-coder:32b          # architect
ollama pull qwen3-coder-next:q8_0      # coder (80B)
ollama pull qwen3-coder:30b-a3b-q8_0   # critic
```

Premium also uses cloud APIs — set these if you want the best results:

```bash
# Add to your ~/.zshrc or ~/.bashrc
export ANTHROPIC_API_KEY=sk-ant-...    # Claude Opus/Sonnet (~$0.20-0.30/mission)
export XAI_API_KEY=xai-...             # Grok architect (optional, ~$0.10/mission)
```

### Verify Installation

```bash
battlecommand-forge status
```

```
BattleCommand Forge v0.1.0
Modules: 30 | Pipeline: 9-stage | Gate: 8.0-9.2/10 (scaled) | Fix rounds: 5

Ollama: connected (12 models)
Claude: configured
GitHub: gh authenticated
Workspaces: 47
Total missions: 47 | Avg score: 7.8/10
Total cost: $4.2100
```

---

## Your First Mission

### Example 1: Simple CLI Tool (C3, ~$0)

```bash
battlecommand-forge mission "Build a Python CLI that converts CSV files to JSON, supporting custom delimiters and output formatting" --preset fast --auto
```

**What happens:**
1. Router scores this as C3 (low complexity)
2. Architect writes a concise spec with 4 files
3. Tester writes 8 test cases
4. Coder generates all files in one shot
5. Verifier creates a venv, runs ruff + pytest
6. If tests fail, surgical fix targets the exact broken file
7. Output lands in `output/csv_to_json_cli/`

**Your output directory:**
```
output/csv_to_json_cli/
├── main.py              # CLI entry point (click/argparse)
├── converter.py         # Core conversion logic
├── tests/
│   └── test_converter.py  # 8 test cases
├── requirements.txt
└── pyproject.toml
```

### Example 2: REST API (C5, ~$0.30)

```bash
battlecommand-forge mission "Build a FastAPI REST API for a todo app with SQLite, full CRUD, due dates, priority levels, and filtering" --preset premium --auto
```

### Example 3: Complex Auth Service (C8, ~$0.50)

```bash
battlecommand-forge mission "Build a FastAPI auth service with JWT access+refresh tokens, RBAC (admin/user/viewer), bcrypt passwords, SQLAlchemy async, rate limiting, and CSRF protection" --preset premium --auto
```

### Example 4: Edit Existing Code

```bash
# Point at an existing project and describe what to change
battlecommand-forge edit --path ./my-project "Add pagination to all list endpoints, with page/per_page query params and Link headers"
```

### Example 5: Custom Output Directory

```bash
battlecommand-forge mission "Build a URL shortener" --auto -o ~/projects/url-shortener
```

### Example 6: Use a GitHub Repo as Context

```bash
battlecommand-forge mission "Add WebSocket support for real-time notifications" --repo https://github.com/user/my-api --auto
```

### Example 7: Voice Announcements (macOS)

```bash
battlecommand-forge mission "Build a health check API" --voice --auto
# Announces: "Mission started..." "Quality gate passed..." "Mission complete."
```

---

## CLI Commands

### `mission` — Run the full pipeline

```bash
battlecommand-forge mission "<prompt>" [options]
```

| Flag | Default | Description |
|------|---------|-------------|
| `--preset` | `premium` | Model preset: `fast`, `balanced`, `premium` |
| `--auto` | off | Skip human approval, auto-continue fix rounds |
| `-o, --output` | auto | Custom output directory |
| `--repo` | — | Clone a GitHub repo as context |
| `--path` | — | Use a local directory as context |
| `--voice` | off | macOS TTS announcements |
| `--architect-model` | — | Override architect model |
| `--tester-model` | — | Override tester model |
| `--coder-model` | — | Override coder model |
| `--reviewer-model` | — | Override security + critique + CTO |

**Examples:**

```bash
# Fully automatic, premium quality
battlecommand-forge mission "Build a markdown blog engine" --preset premium --auto

# Fast iteration, local only, no API costs
battlecommand-forge mission "Build a todo CLI" --preset fast --auto

# Manual mode — approve each stage
battlecommand-forge mission "Build an auth service"

# Override just the coder model
battlecommand-forge mission "Build a chat server" --auto --coder-model claude-sonnet-4-6
```

### `chat` — CTO Research Chat (CLI)

```bash
battlecommand-forge chat
```

Interactive REPL where you chat with the CTO agent to plan missions before launching. Supports tool calling (web search, file reading, directory listing), conversation history, and launching missions directly from chat.

```
$ battlecommand-forge chat --preset premium

BattleCommand Forge — CTO Chat (claude-sonnet-4-6)
Plan your mission, research architecture, or ask anything.
Type /mission <prompt> to launch. /clear to reset. /quit to exit.

> What's the best database for a real-time chat app?
  [web_search: "best database real-time chat 2026"]
  [web_search → Redis for pub/sub + message queue, PostgreSQL for persistence...]

For a real-time chat app, I recommend a dual-database approach:
- **Redis** for pub/sub messaging, presence tracking, and room state
- **PostgreSQL** (async via SQLAlchemy) for message history and user accounts
...

> /mission Build a WebSocket chat server with Redis pub/sub, PostgreSQL message history, FastAPI, rooms, and user presence

Launching mission: Build a WebSocket chat server...
[ROUTER] Complexity: C7 (high)
...
```

**Chat commands:**

| Command | Action |
|---------|--------|
| `/mission <prompt>` | Launch a mission from chat |
| `/clear` | Clear conversation history |
| `/compress` | Compact long history |
| `/help` | List commands |
| `/quit` | Exit chat |

### `tui` — Interactive Terminal UI

```bash
battlecommand-forge tui
```

Full-featured 6-tab interface with CTO research chat, model picker, hardware monitoring, and 15 slash commands. See [Interactive TUI](#interactive-tui) for details.

### `edit` — Modify Existing Code

```bash
battlecommand-forge edit --path <directory> "<what to change>"
```

```bash
# Add tests to an existing project
battlecommand-forge edit --path ./my-api "Add comprehensive pytest tests for all endpoints"

# Refactor a module
battlecommand-forge edit --path ./my-lib "Refactor the parser module to use the visitor pattern"
```

### `verify` — Run Quality Checks

```bash
battlecommand-forge verify --path <directory> [--lang python]
```

Creates a venv, installs deps, runs ruff (linting) and pytest. Shows score, test results, lint issues, and secret detection.

```bash
# Verify a generated project
battlecommand-forge verify --path output/todo_crud

# Verify your own project
battlecommand-forge verify --path ~/projects/my-api --lang python
```

### `models` — List, Benchmark, and Compare

```bash
# List all Ollama models
battlecommand-forge models list

# Benchmark a specific model (speed + quality test)
battlecommand-forge models benchmark qwen2.5-coder:32b

# Show preset configurations
battlecommand-forge models presets
```

### `stress` — Run the Stress Test Suite

```bash
# Run all 21 graded tasks (C4-C9)
battlecommand-forge stress --tasks 21 --preset premium

# Quick smoke test (5 tasks)
battlecommand-forge stress --tasks 5 --preset fast
```

### `status` — System Health Check

```bash
battlecommand-forge status
```

Shows Ollama connection, API key status, workspace count, mission history, and total cost.

### `report` — View Pipeline Reports

```bash
# List all reports
battlecommand-forge report list

# Show the latest report (detailed breakdown)
battlecommand-forge report show
```

### `audit` — View Audit Log

```bash
# Last 20 entries
battlecommand-forge audit --limit 20
```

### `settings` — Model Configuration

```bash
# Show resolved config for a preset
battlecommand-forge settings show --preset premium

# Generate default .battlecommand/models.toml
battlecommand-forge settings init
```

### `github` — Push and Create PRs

```bash
# Check if gh CLI is available
battlecommand-forge github check

# Push a workspace
battlecommand-forge github push --workspace output/my_project --branch main

# Create a PR
battlecommand-forge github create-pr --workspace output/my_project --title "Add books API" --base main
```

### `hw` — Hardware Metrics

```bash
battlecommand-forge hw
```

Shows CPU, RAM, Ollama status, and VRAM usage.

---

## Interactive TUI

Launch with:

```bash
battlecommand-forge tui
```

### 6 Tabs

| Tab | Key | What's There |
|-----|:---:|-------------|
| **Chat** | `1` | CTO research agent with 10 tools |
| **Queue** | `2` | Mission queue and status |
| **Models** | `3` | Available Ollama models |
| **Code** | `4` | Generated code viewer |
| **HW** | `5` | Live hardware metrics |
| **Log** | `6` | Pipeline activity log |

### Chat with the CTO Agent

The Chat tab connects you to an AI CTO agent with tool access. It can search the web, read files, list directories, and launch missions — all from the chat interface.

```
> What's the best way to structure a FastAPI app with SQLAlchemy async?

CTO: For a production FastAPI + async SQLAlchemy setup, I recommend...
[web_search: "FastAPI async SQLAlchemy best practices 2026"]
...here's the recommended structure:
  app/
    main.py         # FastAPI app + middleware
    database.py     # Single AsyncEngine + AsyncSession factory
    models.py       # SQLAlchemy ORM models (import Base from database)
    schemas.py      # Pydantic v2 models (UserRead, UserCreate)
    routes/         # One file per resource
    dependencies.py # get_db session dependency
...

> /mission Build that FastAPI app with user CRUD and SQLAlchemy async

[Mission launched: Build that FastAPI app...]
```

**CTO Tools (10):**

| Tool | What It Does |
|------|-------------|
| `web_search` | Search the web (Brave Search or DuckDuckGo fallback) |
| `web_fetch` | Fetch and read a web page |
| `read_file` | Read any local file |
| `list_files` | List directory contents |
| `run_mission` | Launch a mission from chat |
| `refine_prompt` | Improve a mission prompt before running |
| `verify_project` | Run verifier on a project directory |
| `list_reports` | List pipeline reports |
| `open_browser` | Open a URL in the default browser |
| `status` | Show system status |

### Slash Commands

Type these in the Chat input:

| Command | What It Does |
|---------|-------------|
| `/mission <prompt>` | Launch a mission |
| `/verify [path]` | Run verifier (default: latest output) |
| `/report [list\|show]` | View pipeline reports |
| `/audit [n]` | Show audit log (default: 10) |
| `/preset <name>` | Switch preset (fast/balanced/premium) |
| `/cost` | Show total API cost |
| `/settings` | Open model picker overlay |
| `/clear` | Clear chat + CTO history |
| `/compress` | Compact CTO conversation history |
| `/models` | Switch to Models tab |
| `/hw` | Switch to Hardware tab |
| `/status` | Show workspace/system info |
| `/snake` | Play snake! |
| `/space` | Play Space Invaders! |
| `/help` | List all commands |

### Keyboard Shortcuts

| Key | Action |
|-----|--------|
| `1`-`6` | Switch tabs (when input is empty) |
| `Tab` | Cycle through tabs |
| `PgUp` / `PgDn` | Scroll chat ±20 lines |
| `Up` / `Down` | Scroll chat ±3 lines (when input is empty) |
| `Home` / `End` | Scroll to top/bottom or cursor start/end |
| `Esc` | Clear input or quit |
| `Ctrl+C` | Quit |

---

## Presets

Presets control which models are used for each pipeline role.

### Fast — $0, Runs Anywhere

```bash
battlecommand-forge mission "..." --preset fast
```

All roles use `qwen2.5-coder:7b`. Needs ~8GB RAM. Best for quick iteration and testing.

### Balanced — $0, Better Quality

```bash
battlecommand-forge mission "..." --preset balanced
```

Uses `qwen2.5-coder:32b` for architect/coder. Needs ~20GB RAM. Good balance of speed and quality.

### Premium — ~$0.30/mission, Best Quality

```bash
battlecommand-forge mission "..." --preset premium
```

Dream Team v3: local 32B architect + Opus tester + local 80B coder + Sonnet fix/CTO + local 30B critics. Needs 48GB+ VRAM and `ANTHROPIC_API_KEY`.

| Role | Model | Provider | Cost |
|------|-------|----------|:----:|
| Architect | qwen2.5-coder:32b | Local | $0 |
| Tester | claude-opus-4-6 | Anthropic | ~$0.20 |
| Coder | qwen3-coder-next:q8_0 (80B) | Local | $0 |
| Fix Coder | claude-sonnet-4-6 | Anthropic | ~$0.05 |
| Security | qwen3-coder:30b-a3b-q8_0 | Local | $0 |
| Critique | qwen3-coder:30b-a3b-q8_0 | Local | $0 |
| CTO | claude-sonnet-4-6 | Anthropic | ~$0.05 |

**Upgrade to full Dream Team v3** — add Grok reasoning architect for complex missions (C7+):

```bash
export XAI_API_KEY=xai-...
battlecommand-forge mission "..." --preset premium --architect-model grok-4.20-reasoning --auto
```

### Comparing Presets

| | Fast | Balanced | Premium |
|---|:---:|:---:|:---:|
| Avg score | 6.5/10 | 7.2/10 | 8.5/10 |
| Tests passing | ~40% | ~60% | ~85% |
| Cost per mission | $0 | $0 | ~$0.30 |
| Min RAM/VRAM | 8 GB | 20 GB | 48 GB |
| Time per mission | 3-8 min | 8-15 min | 8-12 min |

---

## Configuration

### Priority Order

Configuration resolves in this order (last wins):

```
Preset defaults → Environment variables → .battlecommand/models.toml → CLI flags
```

### Config File

Generate the default config:

```bash
battlecommand-forge settings init
```

This creates `.battlecommand/models.toml`:

```toml
# BattleCommand Forge — Model Configuration
preset = "premium"

# Uncomment to customize any role:

# [architect]
# model = "qwen2.5-coder:32b"
# context_size = 32768
# max_predict = 4096

# [tester]
# model = "claude-opus-4-6"
# context_size = 200000
# max_predict = 8192

# [coder]
# model = "qwen3-coder-next:q8_0"
# context_size = 65536
# max_predict = 32768

# [fix_coder]
# model = "claude-sonnet-4-6"

# [security]
# model = "qwen3-coder:30b-a3b-q8_0"

# [critique]
# model = "qwen3-coder:30b-a3b-q8_0"

# [cto]
# model = "claude-sonnet-4-6"
```

### Per-Role Options

Each role section supports:

| Field | Default | Description |
|-------|---------|-------------|
| `model` | from preset | Model name (e.g. `claude-sonnet-4-6`, `qwen2.5-coder:32b`) |
| `context_size` | 32768 | Context window in tokens |
| `max_predict` | 8192 | Max output tokens |

Provider is auto-detected: `claude-*` and `grok-*` models → cloud, everything else → local (Ollama).

### CLI Overrides (Highest Priority)

```bash
# Override a single role
battlecommand-forge mission "..." --coder-model claude-sonnet-4-6

# Override all reviewers at once
battlecommand-forge mission "..." --reviewer-model claude-sonnet-4-6

# Mix and match
battlecommand-forge mission "..." \
  --architect-model grok-4.20-reasoning \
  --tester-model claude-opus-4-6 \
  --coder-model qwen3-coder-next:q8_0 \
  --reviewer-model claude-sonnet-4-6
```

---

## Environment Variables

### Required (for cloud models)

```bash
export ANTHROPIC_API_KEY=sk-ant-...    # Claude Opus/Sonnet
export XAI_API_KEY=xai-...             # Grok models
```

### Optional

| Variable | Example | Description |
|----------|---------|-------------|
| `OLLAMA_HOST` | `192.168.1.100:11434` | Remote Ollama URL |
| `BRAVE_API_KEY` | `BSA...` | CTO web search (falls back to DuckDuckGo) |
| `ARCHITECT_MODEL` | `grok-4.20-reasoning` | Override architect |
| `TESTER_MODEL` | `claude-opus-4-6` | Override tester |
| `CODER_MODEL` | `qwen3-coder-next:q8_0` | Override coder |
| `FIX_CODER_MODEL` | `claude-sonnet-4-6` | Override fix coder |
| `SECURITY_MODEL` | `qwen3-coder:30b-a3b-q8_0` | Override security reviewer |
| `CRITIQUE_MODEL` | `qwen3-coder:30b-a3b-q8_0` | Override critique panel |
| `CTO_MODEL` | `claude-sonnet-4-6` | Override CTO |
| `REVIEWER_MODEL` | `claude-sonnet-4-6` | Override security + critique + CTO together |

You can also use a `.env` file in the project root — it's loaded automatically.

### Remote Ollama (Cloud GPU)

Run models on a remote machine with a beefy GPU:

```bash
# On the remote machine (H200, A100, etc.):
bash scripts/h200-setup.sh

# From your Mac:
OLLAMA_HOST=192.168.1.100:11434 battlecommand-forge mission "..." --preset premium --auto
```

---

## Common Workflows

### Workflow 1: Quick Prototype → Polish

```bash
# 1. Fast draft to see the shape
battlecommand-forge mission "Build a URL shortener with FastAPI" --preset fast --auto

# 2. Check what needs fixing
battlecommand-forge verify --path output/url_shortener

# 3. Re-run with premium for production quality
battlecommand-forge mission "Build a URL shortener with FastAPI" --preset premium --auto
```

### Workflow 2: Research → Build

```bash
# 1. Launch TUI
battlecommand-forge tui

# 2. Chat with CTO to research architecture
> What are the best practices for WebSocket chat servers in Python?
> What database should I use for real-time message storage?

# 3. Launch mission from chat when ready
> /mission Build a WebSocket chat server with rooms, user presence, message history in Redis, and FastAPI HTTP endpoints for room management
```

### Workflow 3: Iterate on Existing Code

```bash
# 1. Generate initial project
battlecommand-forge mission "Build a config parser library" --auto -o ~/projects/config-lib

# 2. Add features with edit mode
battlecommand-forge edit --path ~/projects/config-lib "Add YAML support alongside TOML and JSON"
battlecommand-forge edit --path ~/projects/config-lib "Add schema validation with descriptive error messages"

# 3. Verify after each edit
battlecommand-forge verify --path ~/projects/config-lib
```

### Workflow 4: Stress Test Your Pipeline

```bash
# Run graded tasks to benchmark your model setup
battlecommand-forge stress --tasks 10 --preset premium

# Check reports
battlecommand-forge report list
```

### Workflow 5: Ship to GitHub

```bash
# 1. Generate project
battlecommand-forge mission "Build a markdown blog engine" --auto

# 2. Push to GitHub
battlecommand-forge github push --workspace output/markdown_blog --branch main

# 3. Create a PR
battlecommand-forge github create-pr \
  --workspace output/markdown_blog \
  --title "feat: markdown blog engine" \
  --base main
```

---

## How the Pipeline Works

```
 ┌─────────┐   ┌───────────┐   ┌────────┐   ┌───────┐   ┌──────────┐
 │ ROUTER  │──▶│ ARCHITECT │──▶│ TESTER │──▶│ CODER │──▶│ VERIFIER │
 │  C1-C10 │   │  ADR+spec │   │  TDD   │   │ code  │   │ venv+test│
 └─────────┘   └───────────┘   └────────┘   └───────┘   └──────────┘
                                                               │
                  ┌──────────────────────────────────────────────┘
                  ▼
           ┌──────────┐   ┌──────────┐   ┌─────┐   ┌──────────────┐
           │ SECURITY │──▶│ CRITIQUE │──▶│ CTO │──▶│ QUALITY GATE │
           │  OWASP   │   │ 5 scores │   │ ok? │   │  ship/fix?   │
           └──────────┘   └──────────┘   └─────┘   └──────────────┘
                                                          │
                                                    ┌─────┴─────┐
                                                    │           │
                                                  PASS        FAIL
                                                    │           │
                                                  SHIP    ┌─────▼─────┐
                                                    ✅     │ SURGICAL  │
                                                          │ FIX LOOP  │──▶ back to CODER
                                                          │ (≤5 rounds)│
                                                          └───────────┘
```

### Stage Details

| # | Stage | What It Does | Model |
|---|-------|-------------|-------|
| 1 | **Router** | Scores complexity C1-C10 (rules + AI) | Local small |
| 2 | **Architect** | Writes ADR, file manifest, TDD test plan | Local 32B |
| 3 | **Tester** | Writes complete test suite BEFORE code | Opus ($0.20) |
| 4 | **Coder** | Generates all files in single shot | Local 80B |
| 5 | **Verifier** | Creates venv, pip install, ruff, pytest | — |
| 6 | **Security** | OWASP Top 10 review | Local/Sonnet |
| 7 | **Critique** | 5-in-1 scoring: DEV/ARCH/TEST/SEC/DOCS | Local/Sonnet |
| 8 | **CTO** | Mission-level coherence check | Sonnet ($0.05) |
| 9 | **Gate** | Ships if `critique*0.4 + verifier*0.6 ≥ threshold` | — |

### Surgical Fix Loop

When the quality gate fails, the pipeline doesn't regenerate everything. It:

1. **Traces imports** — finds exactly which files are broken (NameError, ImportError, etc.)
2. **Fixes surgically** — each broken file gets its own LLM call with specific error context
3. **Preserves clean code** — files that pass stay untouched
4. **Tracks progress** — if score declines 2+ rounds in a row, restores the best round's files
5. **Limits scope** — fix rounds fix bugs only, never add features (prevents regression)

### Quality Gate Thresholds

| Complexity | Threshold | Why |
|:----------:|:---------:|-----|
| C1-C6 | 9.2/10 | Simple tasks should be near-perfect |
| C7-C8 | 8.5/10 | Complex but achievable with fix rounds |
| C9-C10 | 8.0/10 | Very complex — reward functional code |

---

## Troubleshooting

### "Ollama: not running"

```bash
# Start Ollama
ollama serve

# Or check if it's running
curl http://localhost:11434/api/tags
```

### "No models available"

```bash
# Pull the model you need
ollama pull qwen2.5-coder:7b

# List what's installed
ollama list
```

### Tests always fail / score stuck below 7

This usually means the tester model is too weak. The #1 improvement is using Opus as the tester:

```bash
export ANTHROPIC_API_KEY=sk-ant-...
battlecommand-forge mission "..." --preset premium --auto
```

Opus writes correct test fixtures on the first try (~$0.20). Local 32B testers write tests that never run.

### "Connection refused" with remote Ollama

```bash
# Make sure Ollama is listening on all interfaces (not just localhost)
OLLAMA_HOST=0.0.0.0:11434 ollama serve

# Test from your Mac
curl http://<remote-ip>:11434/api/tags
```

### Build fails with old Rust

```bash
rustup update
# Needs Rust 1.91+
```

### High API costs

Switch to a cheaper preset or use all-local models:

```bash
# $0 — all local
battlecommand-forge mission "..." --preset fast --auto

# Or just make the coder local, keep cloud reviews
battlecommand-forge mission "..." --preset premium --coder-model qwen2.5-coder:32b --auto
```

Check your spend anytime:

```bash
battlecommand-forge status   # shows total cost
# Or in TUI: /cost
```

### Generated code has import errors

This is the most common issue. The pipeline handles it automatically via surgical fix rounds, but if you see it in the final output:

```bash
# Re-verify to see exact errors
battlecommand-forge verify --path output/my_project

# The error patterns are saved for future runs
cat .battlecommand/failure_patterns.md
```

---

## Architecture

### 30 Modules, 14,000+ Lines of Rust

| Module | Purpose |
|--------|---------|
| `mission.rs` | 9-stage pipeline orchestration + surgical fix loop |
| `tui.rs` | 6-tab interactive TUI with CTO chat + 15 slash commands |
| `llm.rs` | Claude API + Ollama + Grok client + streaming + tool calling |
| `cto.rs` | CTO agent with 10 tools (web search, file read, verify, etc.) |
| `verifier.rs` | Venv creation + pip install + ruff + pytest |
| `codegen.rs` | Multi-file extraction from LLM output |
| `model_config.rs` | Per-role model config (preset → env → TOML → CLI) |
| `model_picker.rs` | Interactive model selection UI overlay |
| `router.rs` | Dual complexity scoring (rules + AI) |
| `editor.rs` | Edit existing codebases via LLM |
| `sandbox.rs` | Sandboxed execution, timeouts, env stripping |
| `memory.rs` | Learnings + few-shot examples + context injection |
| `enterprise.rs` | Audit logging, cost tracking, RBAC |
| `report.rs` | Pipeline report generation + viewer |
| `hardware.rs` | CPU/RAM/VRAM/Ollama monitoring |
| `models.rs` | Model listing, benchmarking, VRAM estimation |
| `workspace.rs` | Isolated git workspaces per mission |
| `swebench.rs` | SWE-bench evaluation: ReAct agent loop, dataset handling |
| `swebench_tools.rs` | 7 ReAct tools: read_file, grep, list_dir, run_command, write, edit, submit |
| `swebench_eval.rs` | SWE-bench report generation with per-repo breakdown |
| `benchmark.rs` | Multi-model benchmark framework (5 graded missions) |
| `swarm.rs` | Swarm mode: planner→coder→QA iteration with best-version selection |
| `custom_commands.rs` | User-defined commands from `.battlecommand/commands/*.md` |
| `stress.rs` | 21-task stress test suite (C4-C9) |
| `snake.rs` | Easter egg snake game |
| `space.rs` | Easter egg Space Invaders game |
| `db.rs` | Mission history (JSON file-based) |
| `context.rs` | Context compaction at 95% capacity |
| `github.rs` | GitHub push/PR via gh CLI |
| `voice.rs` | macOS TTS announcements |

### Key Design Decisions

- **Pure Rust, no Python bridge** — single binary, no runtime deps
- **TDD-first pipeline** — tests are written BEFORE code, not after
- **Surgical fixes over regeneration** — fix rounds target only broken files, preserving working code
- **Mix-and-match models** — each pipeline role can use a different model (local or cloud)
- **Venv per project** — generated code runs in isolated environments
- **Streaming everything** — architect/tester/coder output streams live to terminal
- **Quality gate is intentionally high** — pushes models to produce production-grade output

---

## License

Apache-2.0. See [LICENSE](LICENSE).