binnacle 0.0.1-alpha.4

# Binnacle

Binnacle is a Rust-based CLI application for AI agents and humans alike to track the state of a project, inspired by [steveyegge/beads](https://github.com/steveyegge/beads).

## Design Philosophy

- **JSON-first output** with `-H` flag for human-readable format
- **Minimal by default**, configurable for power users  
- **No repo pollution** - external storage by default (configurable)
- **Test-aware** - first-class test nodes linked to tasks
- **No commit message requirements** - explicit linking instead

## Entity Types

| Type | ID Format | Description |
|------|-----------|-------------|
| Task | `bn-xxxx` | Work item with status, priority, dependencies |
| Test | `bnt-xxxx` | Test node with command, linked to tasks |

## Storage

Default: External storage (doesn't touch repo)

```
~/.local/share/binnacle/<repo-hash>/
├── tasks.jsonl       # Tasks and test nodes (append-only)
├── commits.jsonl     # Commit-to-task links
├── test-results.jsonl # Test run history
├── cache.db          # SQLite index
└── config.toml       # Local settings
```

Configurable alternatives:
- Orphan branch (`binnacle-data`)
- Git notes (`refs/notes/binnacle`)

## CLI Commands

### Core
```bash
bn                              # Status summary (JSON)
bn -H                           # Human-readable status
bn init                         # Initialize for this repo
bn orient                       # Onboarding for AI agents (auto-inits)
```

### Tasks (noun-verb)
```bash
bn task create "Title" [-p N] [-t tag] [-a user]
bn task list [--status X] [--priority N] [--tag T]
bn task show bn-a1b2
bn task update bn-a1b2 [--title|--desc|--priority|--status|...]
bn task close bn-a1b2 [--reason "..."]
bn task reopen bn-a1b2
bn task delete bn-a1b2
```

### Links (Dependencies & Relationships)
```bash
bn link add <source> <target> --type depends_on  # source depends on target
bn link rm <source> <target>
bn link list bn-a1b2            # Show links for an entity
```

### Queries
```bash
bn ready                        # Tasks with no open blockers
bn blocked                      # Tasks waiting on dependencies
```

### Tests
```bash
bn test create "Name" --cmd "cargo test foo" [--dir "."] [--task bn-a1b2]
bn test list [--task bn-a1b2]
bn test show bnt-xxxx
bn test link bnt-xxxx bn-a1b2
bn test unlink bnt-xxxx bn-a1b2
bn test run [bnt-xxxx | --task bn-a1b2 | --all | --failed]
```

### Commit Tracking
```bash
bn commit link <sha> bn-a1b2    # Associate commit with task
bn commit list bn-a1b2          # Show commits linked to task
```

### Maintenance
```bash
bn doctor                       # Health check, detect issues
bn log [bn-a1b2]               # Audit trail of changes
bn compact                      # Summarize old closed tasks
bn sync                         # Push/pull (when sharing enabled)
```

### Configuration
```bash
bn config get <key>
bn config set <key> <value>
bn config list
```

### MCP Server
```bash
bn mcp serve                    # Start stdio MCP server
bn mcp manifest                 # Output tool definitions
```

## Data Schemas

### Task
```json
{
  "id": "bn-a1b2",
  "type": "task",
  "title": "Implement auth middleware",
  "description": "Add JWT validation to all API endpoints",
  "priority": 1,
  "status": "in_progress",
  "parent": null,
  "tags": ["backend", "security"],
  "assignee": "agent-claude",
  "depends_on": ["bn-c3d4"],
  "created_at": "2026-01-19T10:00:00Z",
  "updated_at": "2026-01-19T14:30:00Z",
  "closed_at": null,
  "closed_reason": null
}
```

### Test Node
```json
{
  "id": "bnt-0001",
  "type": "test",
  "name": "JWT validation tests",
  "command": "cargo test auth::jwt_validation",
  "working_dir": ".",
  "pattern": "tests/auth_test.rs::test_jwt_*",
  "linked_tasks": ["bn-a1b2"],
  "created_at": "2026-01-19T10:00:00Z"
}
```

## Status Flow

```
pending → in_progress → done
                     ↘ blocked
                     ↘ cancelled
done → reopened → in_progress → done
```

## Regression Detection

1. `bn test run` executes tests
2. On failure: look up linked tasks
3. If linked task is closed → auto-reopen with status `reopened`
4. Add note with failure context and commits since close

## MCP Integration

### Tools
All CLI operations exposed as MCP tools: `bn_task_create`, `bn_task_list`, `bn_ready`, `bn_test_run`, `bn_milestone_*`, `bn_link_*`, `bn_search_link`, etc.

### Resources
- `binnacle://tasks` - All tasks (subscribable)
- `binnacle://ready` - Currently actionable tasks

### Prompts
- `start_work` - Begin working on a task
- `finish_work` - Complete current task properly
- `triage_regression` - Investigate a test failure
- `plan_feature` - Break down a feature into tasks
- `status_report` - Generate summary of current state

---

# Implementation Plan

## Testing Strategy

**CRITICAL: Each phase must have comprehensive unit and integration tests before moving to the next phase.**

### Testing Philosophy
1. **Test-first where possible** - Write tests before implementation for core logic
2. **Unit tests for each phase** - No phase is complete until tests pass
3. **Integration tests** - CLI behavior tests using temp directories
4. **Property-based tests** - For ID generation, serialization round-trips
5. **Dogfooding** - Use binnacle to track binnacle development once Phase 3 is ready

### Test Organization
```
tests/
├── unit/
│   ├── models_test.rs        # Task, Test, Commit serialization
│   ├── storage_test.rs       # JSONL, SQLite operations
│   ├── id_generation_test.rs # Hash ID properties
│   ├── dependency_test.rs    # Graph, cycle detection
│   └── regression_test.rs    # Reopen logic
├── integration/
│   ├── cli_task_test.rs      # Task CRUD via CLI
│   ├── cli_test_test.rs      # Test node operations
│   ├── cli_dep_test.rs       # Dependency commands
│   └── cli_mcp_test.rs       # MCP server tests
└── fixtures/
    ├── test_pass.sh          # Always exits 0
    ├── test_fail.sh          # Always exits 1
    └── sample_tasks.jsonl    # Pre-populated test data
```

---

## Phase 0: Project Setup ✅

**Goal:** Rust project scaffolding, basic structure

### Deliverables
- [x] `Cargo.toml` with dependencies (clap, serde, rusqlite, chrono, sha2, dirs, thiserror)
- [x] Project structure:
  ```
  src/
  ├── main.rs
  ├── lib.rs
  ├── cli/
  ├── models/
  ├── storage/
  ├── commands/
  └── mcp/
  ```
- [x] Test utilities module (assert_cmd, predicates, tempfile in dev-dependencies)

### Tests
- [x] Smoke test: `bn --version`, `bn --help` (7 integration tests + 9 unit tests)

---

## Phase 1: Core Task Management ✅

**Goal:** Basic task CRUD with JSON output

### Deliverables
- [x] Task model with all fields
- [x] Hash-based ID generation (`bn-xxxx`)
- [x] Storage layer (JSONL + SQLite cache)
- [x] Commands: `init`, `task create/list/show/update/close/delete`
- [x] Output: JSON default, `-H` for human-readable

### Unit Tests
- [x] ID generation: uniqueness, format validation
- [x] Task model: serialization round-trip
- [x] Storage: JSONL append/read, SQLite rebuild
- [x] Field validation: priority 0-4, status enum

### Integration Tests
- [x] `bn init` creates directory structure
- [x] Full CRUD round-trip
- [x] Filtering by status, priority, tags
- [x] `-H` flag output difference

### Test Summary (Phase 1)
- 24 unit tests (models, storage, commands)
- 23 CLI integration tests (task CRUD, filtering, output formats)
- 7 smoke tests (version, help, basic output)

---

## Phase 2: Dependencies & Queries ✅

**Goal:** Task relationships, smart queries

### Deliverables
- [x] Dependency graph storage
- [x] Cycle detection
- [x] Commands: `dep add/rm/show`, `ready`, `blocked`

### Unit Tests
- [x] Cycle detection (A→B→C→A fails)
- [x] Transitive blocking calculation
- [x] Self-dependency rejection

### Integration Tests
- [x] `bn dep add` creates relationship, rejects cycles
- [x] `bn ready` and `bn blocked` correctness

### Test Summary
- 41 unit tests (models, storage, commands including 12 new dependency tests)
- 35 CLI integration tests (task CRUD, deps, queries, output formats)
- 7 smoke tests (version, help, basic output)

---

## Phase 3: Test Nodes ✅

**Goal:** First-class test tracking with regression detection

### Deliverables
- [x] Test model (`bnt-xxxx`)
- [x] Test results storage
- [x] Commands: `test create/list/show/link/unlink/run`
- [x] Regression detection: auto-reopen closed tasks on failure

### Unit Tests
- [x] Test model serialization
- [x] Command execution, exit code capture
- [x] Regression detection logic

### Integration Tests
- [x] Test node CRUD
- [x] `bn test run` execution
- [x] Regression: closed task reopens on failure

### Test Summary (Phase 3)
- 51 unit tests (models, storage, commands including 11 new test node tests)
- 64 CLI integration tests (35 task + 29 test node tests)
- 7 smoke tests (version, help, basic output)
- **Total: 122 tests**

---

## Phase 4: Commit Tracking ✅

**Goal:** Explicit commit-to-task links

### Deliverables
- [x] Commit link model
- [x] Commands: `commit link/unlink/list`
- [x] Regression context includes commits since close

### Unit Tests
- [x] Commit link serialization, SHA validation
- [x] Lookup by task and by commit

### Integration Tests
- [x] Link/unlink operations
- [x] `bn commit list` output
- [x] Regression context includes commits

### Test Summary (Phase 4)
- 72 unit tests (models, storage, commands including 18 new commit tracking tests)
- 82 CLI integration tests (35 task + 29 test + 18 commit tests)
- 7 smoke tests (version, help, basic output)
- **Total: 161 tests**

---

## Phase 5: Maintenance Commands ✅

**Goal:** Health, history, compaction

### Deliverables
- [x] Commands: `doctor`, `log`, `compact`, `config`

### Unit Tests
- [x] Doctor checks: orphans, consistency
- [x] Compact logic: summarization preserves key info
- [x] Config get/set/list operations
- [x] Log filtering and entry retrieval

### Integration Tests
- [x] `bn doctor` detects known issues
- [x] `bn log` chronological output
- [x] `bn config` get/set/list
- [x] `bn compact` preserves data integrity

### Test Summary (Phase 5)
- 94 unit tests (models, storage, commands including 26 new maintenance tests)
- 109 CLI integration tests (35 task + 29 test + 18 commit + 27 maintenance tests)
- 7 smoke tests (version, help, basic output)
- **Total: 210 tests**

---

## Phase 6: MCP Server ✅

**Goal:** Expose binnacle as MCP tools for agents

### Deliverables
- [x] MCP server (`bn mcp serve`)
- [x] All operations as MCP tools (38 tools with proper JSON schemas)
- [x] Resources (`binnacle://tasks`, `binnacle://ready`, `binnacle://blocked`, `binnacle://status`)
- [x] Prompts (`start_work`, `finish_work`, `triage_regression`, `plan_feature`, `status_report`)
- [x] Milestone tools (`bn_milestone_*`)
- [x] Search tools (`bn_search_link`)

### Unit Tests
- [x] Tool handlers return correct schema
- [x] Invalid requests rejected gracefully
- [x] Resource reading works correctly
- [x] Prompt generation works correctly

### Integration Tests
- [x] Server starts, responds to `initialize`
- [x] `bn mcp manifest` outputs valid JSON with tools, resources, prompts
- [x] All tool schemas validated

### Test Summary (Phase 6)
- 114 unit tests (models, storage, commands, mcp including 20 new MCP tests)
- 132 CLI integration tests (35 task + 29 test + 18 commit + 27 maintenance + 16 MCP + 7 smoke)
- **Total: 246 tests**

---

## Phase 7: Agent Onboarding (`bn orient`) ✅

**Goal:** Self-documenting tooling that helps AI agents discover and use binnacle

### Motivation
Rather than requiring manual AGENTS.md maintenance, binnacle can bootstrap agent awareness automatically. When `bn init` runs, it adds a short blurb to AGENTS.md pointing agents to `bn orient`. The `bn orient` command then serves as the canonical, always-up-to-date source of truth for how to interact with binnacle.

### Deliverables
- [x] Update `bn init` to:
  - Create or append to AGENTS.md with a `## Task Tracking` section
  - Blurb explains the project uses **bn** (binnacle) for issue/test tracking
  - Instructs agents to run `bn orient` to get oriented
  - Skip if AGENTS.md already contains `bn orient` reference
- [x] `bn orient` command that:
  - Initializes binnacle for the repo if not already initialized
  - Outputs a brief summary of current project state (tasks, ready items, blocked items)
  - Explains binnacle's purpose and key commands
  - Returns JSON by default, human-readable with `-H`

### AGENTS.md Blurb (added by `bn init`)

```markdown
## Task Tracking

This project uses **binnacle** (`bn`) for issue and test tracking.

To get oriented, run:
```bash
bn orient
```
```

### Example `bn orient` Output

```bash
$ bn orient -H
Binnacle - AI agent task tracker

This project uses binnacle (bn) for issue and test tracking.

Current State:
  Total tasks: 42
  Ready: 3 (bn-a1b2, bn-c3d4, bn-e5f6)
  Blocked: 2
  In progress: 1

Key Commands:
  bn              Status summary (JSON, use -H for human-readable)
  bn ready        Show tasks ready to work on
  bn task list    List all tasks
  bn show X       Show any entity by ID (works for bn-/bnt-/bnq- IDs)
  bn test run     Run linked tests

Run 'bn --help' for full command reference.
```

### Unit Tests
- [x] `bn init` creates AGENTS.md if missing
- [x] `bn init` appends to existing AGENTS.md
- [x] `bn init` skips if `bn orient` already mentioned
- [x] Orient output includes current task counts
- [x] Orient auto-initializes repo if needed

### Integration Tests
- [x] `bn init` in fresh repo creates AGENTS.md with blurb
- [x] `bn init` with existing AGENTS.md appends section
- [x] `bn init` is idempotent (doesn't duplicate blurb)
- [x] `bn orient` works in uninitialized repo (auto-inits)
- [x] `bn orient -H` produces human-readable output

### Test Summary (Phase 7)
- 132 unit tests (models, storage, commands, mcp including 9 new orient/init tests)
- 138 CLI integration tests (35 task + 29 test + 18 commit + 27 maintenance + 16 MCP + 15 orient + 7 smoke - 9 overlap correction = tests counted)
- **Total: 270 tests**

---

## Phase 8: Alternative Storage Backends ✅

**Goal:** Orphan branch and git notes support

### Deliverables
- [x] Orphan branch backend (`binnacle-data` branch)
- [x] Git notes backend
- [x] `bn sync` for shared mode
- [x] Migration between backends

### Implemented: Orphan Branch Backend

The orphan branch backend stores binnacle data in a git orphan branch named `binnacle-data`. 
This keeps all data within the repository without polluting the main branch or working tree.

**Key Features:**
- Uses git plumbing commands (hash-object, mktree, commit-tree, update-ref)
- Never modifies the working tree
- Full commit history for data changes
- Data persists across clones when branch is pushed

**Architecture:**
- `StorageBackend` trait abstracts storage operations
- `OrphanBranchBackend` implements the trait using git commands
- Files stored: `tasks.jsonl`, `commits.jsonl`, `test-results.jsonl`

**Test Summary (Phase 8 - Orphan Branch):**
- 8 unit tests (backend initialization, read/write, branch management)
- 9 integration tests (data persistence, commit history, no working tree pollution)

### Implemented: Git Notes Backend

The git notes backend stores binnacle data in git notes at `refs/notes/binnacle`.
Each JSONL file is stored as a separate note attached to a deterministic blob object.

**Key Features:**
- Uses `git notes` commands for storage
- Each file stored as a separate note with `binnacle:` prefix
- Data persists across clones when notes are pushed
- Supports the same operations as other backends

### Implemented: Sync Command

The `bn sync` command enables pushing/pulling binnacle data with remotes.

**Key Features:**
- Works with orphan-branch backend
- Supports `--push` only or `--pull` only modes
- Configurable remote name (default: origin)

### Implemented: Storage Migration

The `bn system migrate` command enables switching between backends.

**Key Features:**
- Migrates data from current (file) backend to orphan-branch or git-notes
- Supports `--dry-run` to preview changes
- Validates target backend type

**Test Summary (Phase 8 - Complete):**
- 8 unit tests for orphan branch backend
- 9 integration tests for orphan branch
- 5 integration tests for migration (file→orphan, file→git-notes)
- Sync command tested via manual verification

---

## Phase 9: CI/CD Pipeline ✅

**Goal:** Automated testing and quality checks

### Deliverables
- [x] GitHub Actions workflow (`.github/workflows/ci.yml`)
- [x] `cargo test` on push/PR
- [x] `cargo clippy` linting
- [x] `cargo fmt --check` formatting verification
- [x] Release workflow for tagged versions (`.github/workflows/release.yml`)

### Workflows Created

**CI Workflow (`.github/workflows/ci.yml`):**
- Runs on push/PR to main/master branches
- Three parallel jobs: test, clippy, fmt
- Uses cargo caching for faster builds
- Fails on any clippy warnings (`-D warnings`)

**Release Workflow (`.github/workflows/release.yml`):**
- Triggered by version tags (e.g., `v0.1.0`)
- Builds binaries for multiple platforms:
  - Linux x86_64
  - macOS x86_64 and ARM64
  - Windows x86_64
- Creates GitHub Release with auto-generated notes
- Uploads platform-specific archives

---

## Phase 10: Crates.io Publish Workflow (Alpha) ✅

**Goal:** Publish alpha releases to crates.io via GitHub Actions with safety checks.

### Deliverables
- [x] GitHub Actions workflow (`.github/workflows/cd.yml`) triggered by releases (version tags)
- [x] Preflight validation job: `cargo fmt --check`, `cargo clippy --all-targets --all-features -- -D warnings`, `cargo test --all-features`
- [x] Publish job guarded by preflight success and `CARGO_REGISTRY_TOKEN` secret
- [x] Dry-run step (`cargo publish --dry-run`) before actual publish
- [x] Tag format documented in README (alpha versioning policy)

### Workflow Outline
1. On tag push matching `v*`.
2. Run preflight checks and `cargo publish --dry-run`.
3. If all checks pass, run `cargo publish` with the registry token.
4. Fail fast on any step; no retries on publish.

---

## Phase 11: Queue Nodes ✅

**Goal:** Agent task prioritization through a work queue

### Motivation

Introduce a **Queue** entity type (`bnq-xxxx`) that serves as a work pool for agent task prioritization. Operators can add tasks to the queue to signal they should be worked on first.

### Deliverables
- [x] Queue model (`bnq-xxxx` IDs)
- [x] Commands: `bn queue create/show/delete`
- [x] `queued` link type for task-to-queue membership
- [x] `bn ready` integration (queued tasks sorted first)
- [x] `bn orient` shows queue info
- [x] Auto-removal of `queued` links when tasks are closed
- [x] MCP tools: `bn_queue_create`, `bn_queue_show`, `bn_queue_delete`
- [x] MCP resource: `binnacle://queue`
- [x] MCP prompt: `prioritize_work`
- [x] GUI support (queue node rendering, visual styles)
- [x] Documentation updates

### Key Design Decisions

1. **Single global queue** - One queue per repo, keeping the model simple
2. **Link-based membership** - Uses `bn link add/rm` with `--type queued`
3. **Unordered membership** - Tasks sorted by priority (0-4), not explicit position
4. **Auto-removal on close** - Completed tasks automatically leave the queue

### Commands

```bash
bn queue create "Title" [--description "..."]   # Create queue (one per repo)
bn queue show                                   # Show queue and its tasks
bn queue delete                                 # Delete queue (removes all queue links)
bn link add bn-xxxx bnq-xxxx --type queued      # Add task to queue
bn link rm bn-xxxx bnq-xxxx                     # Remove task from queue
```

### Test Summary (Phase 11)
- Unit tests for Queue model serialization
- Integration tests for queue CRUD, ready ordering, auto-removal
- MCP integration tests for queue tools and resources

See `prds/PRD_QUEUE_NODES.md` for full specification.

---

## Future Considerations (v2+)

- Agent sessions (multi-agent coordination)
- WIP limits per priority
- Focus mode with auto-commit-linking
- Test result trends and flakiness detection
- Natural language query interface