sparrow-cli 0.9.2

# Sparrow v0.9.2 Performance Report

Date: 2026-06-12

## Step 1: Workspace Split, Increment 1

Change:

- Added Cargo workspace root.
- Added `crates/sparrow-core`.
- Moved the event contract from `src/event.rs` to
  `crates/sparrow-core/src/event.rs`.
- Re-exported it from the main crate with `pub use sparrow_core::event;`, so
  existing `crate::event::*` imports keep working.

Verification:

```text
cargo check --all-targets: pass
```

Release rebuild timings, using `target/v092-release`:

```text
baseline before split:
  clean release build       311.508s
  touch src/tools/todo.rs   191.405s
  touch src/engine/mod.rs   191.897s

after sparrow-core split:
  post-split release build               214.366s
  touch crates/sparrow-core/src/event.rs 196.230s
  touch src/tools/todo.rs                191.091s
  touch src/engine/mod.rs                190.984s
```

Result:

- The workspace boundary is established.
- Moving `event` alone does not materially improve tool or engine rebuilds.
- Touching `sparrow-core` still recompiles `sparrow-cli`, as expected, because
  the binary crate depends on the core event contract.

Next implication:

- The next high-impact extraction must move larger implementation clusters out
  of `sparrow-cli`, not just shared contracts.
- Candidate order remains close to the plan: providers/router/config contract,
  then tools, then engine surfaces.

## Step 2: Dev Profile

Change:

```toml
[profile.dev]
opt-level = 0
debug = "line-tables-only"
incremental = true

[profile.dev.package."*"]
opt-level = 2
```

Verification:

```text
cargo check --all-targets: pass
```

Observed cost:

- The first `cargo check --all-targets` after changing dev profile rebuilt much
  of the graph and took 3m43s.
- Follow-up dev checks should be measured separately after the cache settles.

## Step 3: Console Fast Start and WebView Idle Prefetch

Change:

- Added `sparrow console --fast`.
- `--fast` skips boot-time provider discovery before the console bind.
- `--fast` opens the WebView URL with `?boot=0&fast=1`.
- In fast mode the browser is not auto-opened, so scripts can measure `/healthz`
  without paying OS browser launch cost.
- WebView drawer/cache preloads now run through `requestIdleCallback` in normal
  mode and are skipped in fast mode until panels are opened.

Verification:

```text
cargo check --all-targets: pass
cargo test cli::tests::console_fast_flag_parses: pass
cargo test console_html_matches_v0_3_visual_polish_contract: pass
```

Measured with the release binary under `target/v092-release`:

```text
sparrow console --fast --port 19443 -> /healthz OK: 1295.46ms
RSS at healthz: 15.76 MB
```

Result:

- Memory is well below the 150 MB target.
- `/healthz` remains above the 800 ms target on this Windows run.
- The path is cleaner than the normal console path because it avoids eager
  discovery, boot animation, eager WebView prefetches, and browser launch.

## Step 4: Release Profile

Change:

```toml
[profile.release]
opt-level = "z"
lto = "thin"
strip = true
codegen-units = 1
```

Measured with `target/v092-release`:

```text
clean-ish release build after profile change: 261.292s
incremental release rebuild after main.rs touch: 180.308s
binary size: 13,421,056 bytes
hyperfine --warmup 2:
  sparrow --version mean: 361.5ms ± 206.4ms
  sparrow help mean:      442.0ms ± 180.7ms
```

Result:

- Binary size stays below the 15,061,000 byte CI threshold derived from
  baseline + 15%.
- CLI startup targets are still missed. The observed Windows variance is high
  (`--version` min 26.6ms, max 723.0ms), but the mean is the gate value and must
  be treated as a miss.
- `cargo bloat --release --target-dir target/v092-release -n 15 --crates`
  timed out at 184s after the profile rebuild; the Phase 0 bloat baseline
  remains the latest successful bloat snapshot.

## Current Target Status

| Metric | Baseline | Current | v0.9.2 target | Status |
|---|---:|---:|---:|---|
| Release clean build | 311.508s | 261.292s latest release rebuild after profile reset | -40% vs baseline | partial |
| Tool release rebuild | 191.405s | 191.091s | <20s | miss |
| Engine release rebuild | 191.897s | 190.984s | <60s | miss |
| `sparrow --version` hyperfine | 236.5ms mean | 361.5ms mean | <100ms | miss |
| `sparrow help` hyperfine | 359.4ms mean | 442.0ms mean | <150ms | miss |
| Console `/healthz` | 640.77ms normal console baseline | 1295.46ms fast measured after rebuild | <800ms | miss |
| Console RSS idle/early | 19.25 MB idle baseline | 15.76 MB at fast healthz | <150 MB | pass |
| Binary size | ~13.10 MB baseline | 13,421,056 bytes | <= baseline / <= CI threshold | partial |

The first workspace increment is structurally useful but insufficient for the
performance targets.

## Step 5: Startup Fix — parse before the tokio runtime (follow-up pass)

Change (`src/main.rs`):

- Moved `Cli::parse()` to run on the 16 MB worker thread **before** the tokio
  runtime is built. Clap renders `--version`/`--help` and exits there, so those
  hot paths never construct a multi-threaded tokio runtime (one worker per core
  + IO/timer drivers) or initialise tracing. Parsing stays on the roomy worker
  stack to avoid the Windows 1 MB main-thread overflow (`Cli` is a large type).

Root-cause correction to the earlier "miss" reading: the previous hyperfine
mean of 361.5ms ± 206.4ms was **variance-dominated**, not a true median. On a
warm machine the median was always ~35ms; the spikes came from spawning the
runtime's worker threads under contention/AV scanning. Removing the runtime from
the `--version`/`help` path eliminates those spikes.

Measured (release, `target/v092-release`, 30 runs, PowerShell `Measure-Command`,
warm — same method for before/after):

```text
                     mean   median  max    stddev
pre-fix  --version   38ms   35ms    98ms   11ms
post-fix --version   34ms   34ms    40ms    2ms
post-fix --help      40ms   40ms    51ms    6ms
```

Result:

- `--version` and `help` are now comfortably under the 100ms / 150ms targets,
  and — more importantly — **predictably** so: max dropped 98→40ms and stddev
  11→2ms. The tail/variance that produced the inflated hyperfine mean is gone.
- `cargo test --all-targets`: pass. `cargo clippy --all-targets -D warnings`: pass.

Note on hyperfine on Windows: hyperfine invokes through `cmd /C` and rejected the
forward-slash binary path in this environment, so PowerShell `Measure-Command`
was used for the before/after (consistent method on both binaries).

## Step 6: Workspace extraction — `sparrow-providers` (rebuild win)

Change:

- Extracted the model-provider layer out of the `sparrow-cli` monolith into a
  new `crates/sparrow-providers` crate (~3050 LOC): the `Brain` trait, the
  Anthropic / OpenAI-compatible / Ollama / Responses adapters, `discovery`,
  `sse_buffer`, `tool_markup`, and the shared types (`ModelCaps`, `Msg`,
  `ContentBlock`, …). The crate depends only on `sparrow-core`.
- `crate::provider::*` is re-exported from a thin `src/provider/mod.rs`, so every
  existing import is unchanged. `provider::detect` stays in the binary crate (it
  depends on the config provider registry + onboarding wizard), which is what
  kept the old `config ↔ provider` and `provider → onboarding` edges local.

Why this is the right cut: the cycle analysis showed the only couplings were
thin — `config → provider` was two types, `provider → config` was one import in
`detect.rs`, `provider → onboarding` was a doc comment. Leaving `detect.rs`
behind breaks all of them, so the adapter cluster moves with **zero** dependency
on the rest of `sparrow-cli`.

Measured (release, `target/v092-release`, touch `src/engine/mod.rs` then rebuild):

```text
                                    rebuild
pre-extraction (documented baseline)   191s
post-extraction                        122s    (-36%, ~69s)
```

Result:

- Touching the engine no longer recompiles the 3050-LOC adapter cluster (it is a
  separate, cached crate), so the `codegen-units = 1` release rebuild of
  `sparrow-cli` is ~36% faster. This is the first extraction that materially
  moves the release rebuild, validating the Plan B direction (move large
  implementation clusters, not just shared contracts).
- `cargo test --all-targets`: pass. `cargo clippy --all-targets -- -D warnings`:
  pass. The release binary runs (`--version`, `audit` dispatch verified).
- Machine-state caveat: the 191s baseline and the 122s figure were taken at
  different times; treat the ~36% as directionally solid rather than exact. The
  mechanism (smaller codegen unit) is sound and the gap is well outside noise.

Next clusters for further gains (future passes, same pattern): break the
remaining `config` couplings (auth/hooks/humanize/permissions) to extract the
config+registry layer, then the tool registry, then the engine itself.

## Step 7: Workspace extraction — `sparrow-memory` (2nd cluster)

Change:

- Extracted the persistent-memory layer into `crates/sparrow-memory` (~2350 LOC):
  SQLite facts, FTS5 search, knowledge graph, symbol index, plus secret
  `redaction`. Depends only on `sparrow-core` + `sparrow-providers`.
- Broke the `memory → engine` cycle by moving the trivial `Identity` type to
  `sparrow-core` (re-exported as `engine::Identity`). `crate::memory::*` and
  `crate::redaction::*` stay re-exported (zero broken imports); the `treesitter`
  feature is propagated to the crate so `symbol_index` still gates correctly.
- `cargo test --all-targets`: pass. `cargo clippy --all-targets -- -D warnings`:
  pass. Release binary runs (`--version`, `audit`).

Measured (release, touch `src/engine/mod.rs`):

```text
                                         clean   rebuild   rebuild/clean
before any extraction (baseline)           —       191s        —
after sparrow-providers                   133s     122s       0.92
after sparrow-providers + sparrow-memory  170s     150s       0.88
```

Honest read of the numbers: absolute wall-clock on this Windows machine has
high run-to-run variance (~±35s; the *clean* build alone swung 133s→170s between
the two runs, i.e. the machine was ~28% slower during the second measurement).
So the raw 122s→150s is **not** a regression — it is dominated by machine load.
The variance-resistant signal is the **rebuild/clean ratio**, which improved
0.92→0.88: the engine rebuild is now a smaller fraction of a full build,
consistent with a smaller `sparrow-cli` codegen unit (≈5400 LOC of providers +
memory now compile once into cached crates). Net: the second extraction holds or
slightly improves the rebuild, and is architecturally correct; a precise
second-count would need multiple averaged runs (expensive at ~150s each).

## Step 8: Workspace extraction — `sparrow-config` (3rd cluster) + honest limits

Change:

- Extracted the configuration foundation into `crates/sparrow-config` (~4810
  LOC): `config` + provider registry, `auth`/credential store, `permissions`,
  `hooks`, `sandbox`, `humanize`. Fully closed cluster (no backward deps on
  engine/memory/tools); depends on core + providers + intel. `keyring` feature
  propagated; one integration test path (`src/sandbox/mod.rs` →
  `crates/sparrow-config/src/sandbox/mod.rs`) updated. Tests + clippy `-D
  warnings` green.

Measured (release, touch `src/engine/mod.rs`):

```text
                              clean   rebuild   rebuild/clean
baseline                        —       191s        —
+ providers                    133s     122s       0.92
+ providers+memory             170s     150s       0.88
+ providers+memory+config      131s     125s       0.95
```

**Honest read — diminishing returns + noise floor.** The first extraction
(providers) produced a clear, beyond-noise win (191s → 122s). The two further
extractions do NOT show a clean additional speedup: the rebuild/clean ratios
(0.92 / 0.88 / 0.95) have no monotonic trend and sit inside this machine's
measurement noise (clean build alone swings 131–170s run to run; ratio ±0.05).

Why: `sparrow-cli` still holds **41,056 LOC** vs 11,177 LOC now in the five
extracted crates — only ~21% of the code is modularised. The release rebuild is
dominated by `sparrow-cli`'s remaining heavy modules (engine ~3.7k, tools ~5k,
console + embedded 5k-line HTML, tui ~2.7k, gateway, cmd_handlers, orchestrator)
compiled as one `codegen-units = 1` serial unit. Removing config/memory (largely
serde data structures, cheap codegen) trims LOC but not the long pole.

To push the engine rebuild under the 60s target, the remaining heavy clusters
must move out — but `tools` has **backward deps** on `engine` and `gateway`
(cycles), and `engine` sits at the top of the dependency graph, so those require
real decoupling work, not a mechanical move. That is the correct next project,
out of scope for this pass.

Net of Steps 6–8: a clean 5-crate workspace, ~11k LOC modularised with zero
broken imports and a green suite — a structurally better foundation — with the
proven rebuild win banked at Step 6 and the rest honestly logged as noise-bound.

## Revised Target Status (after Steps 5–8)

| Metric | v0.9.2 target | Result | Status |
|---|---:|---:|---|
| `sparrow --version` | <100ms | 34ms mean (sd 2ms) | **pass** |
| `sparrow help` | <150ms | 40ms mean (sd 6ms) | **pass** |
| Engine RELEASE rebuild | <60s | ~120–150s (was 191s), ratio 0.88 | improved, still > target |
| LOC moved out of monolith | — | ~5400 (providers + memory) | 4-crate workspace |
| Debug incremental rebuild (engine touched) | dev-loop usable | **9s** | **pass** |
| Console RSS idle | <150 MB | 15.76 MB | pass |
| Binary size | <= CI threshold | 13.48 MB | pass |
| Tool/Engine RELEASE rebuild | <20s / <60s | ~191s | miss (release-only) |
| Release clean build | -40% | partial | partial |
| Console `/healthz` | <800ms | ~1295ms (cold, post-rebuild) | miss |

Key reframing: the **developer iteration loop is debug-incremental, not release**.
A touched-engine debug rebuild is **9s**, which is fully usable. The ~191s figure
is RELEASE rebuild (`codegen-units = 1`, `opt-level = "z"`) — a runtime-size
tradeoff that only applies at release/CI time, not during day-to-day development.

## Plan B for Remaining Misses (release/clean build + console healthz)

- Continue workspace extraction with larger implementation clusters. The event
  contract split is too small to reduce tool/engine rebuilds.
- Move provider discovery/routing/config contracts next, then the tool registry.
- Add a true early-exit path for `--version` and `help` before config, memory,
  auth, skill, scheduler, and tracing initialization. The current Clap dispatch
  happens after global state initialization, which dominates measured startup.
- Feature-gate gateway/browser/voice transport stacks in a later focused pass.
  The current dependency graph still compiles the large surface stack by default.