# ApiForge — Deep Dive
> Production-grade API release automation CLI, written in Rust.
> From merged code to healthy pods in production — one command.
---
## 1. The problem
Releasing an API service usually means a fragile chain of manual steps:
bump the version file, write a changelog, commit, tag, push, build a Docker
image, authenticate against a registry, push image tags, patch the Kubernetes
deployment, watch the rollout, create a GitHub release, check the health
endpoint, tell the team. Every step can fail, every failure leaves the system
in a half-released state, and the recovery procedure lives in someone's head.
ApiForge turns that chain into a single, transactional command:
```bash
apiforge release patch
```
If anything fails midway, everything that already ran is **rolled back
automatically, in reverse order**.
---
## 2. Architecture
```
CLI (clap)
│
▼
Commands (main.rs) ──── init / doctor / release / rollback / history / status / config validate
│
▼
ReleaseOrchestrator ──── validate all → execute in order → reverse-order rollback on failure
│
▼
Steps (Step trait) ──── git-preflight · version-bump · changelog · git-commit · git-tag
│ git-push · docker-build · docker-push · k8s-update · k8s-rollout
│ github-release · health-check · slack/webhook notify
▼
Integrations ──── git2 (libgit2) · bollard (Docker) · kube · AWS SDK (ECR/STS) · octocrab (GitHub)
│
▼
External systems ──── Git remote · Docker daemon · container registry · K8s cluster · GitHub · Slack
```
Cross-cutting modules:
| `config.rs` | `apiforge.toml` model, validation, `${VAR}` env resolution |
| `error.rs` | Typed error hierarchy (`thiserror`) per domain |
| `audit/` | Embedded sled DB: release history with per-step records |
| `output/` | Terminal rendering: spinners on TTY, plain lines in CI, stderr routing for JSON mode |
| `utils/` | semver bumping, Tera templating, retry with exponential backoff + jitter, secret redaction, rollback target selection |
---
## 3. The Step contract
Every pipeline action implements one trait:
```rust
#[async_trait]
pub trait Step: Send + Sync {
fn name(&self) -> &str;
fn description(&self) -> &str;
async fn validate(&self, ctx: &StepContext) -> Result<()>; // preflight
async fn execute(&self, ctx: &StepContext) -> Result<StepOutput>;
async fn dry_run(&self, ctx: &StepContext) -> Result<StepOutput>;
async fn rollback(&self, _ctx: &StepContext) -> Result<()> { Ok(()) }
}
```
Design points:
- **Validation is separated from execution.** The orchestrator validates
*every* step before executing *any* step — a missing Dockerfile or invalid
GitHub token aborts the release before the first git mutation.
- **Dry-run is a first-class path**, not a flag check inside execute. Steps
return structured previews (file diffs, resolved image tags, estimated
layers, changelog content) with zero side effects.
- **Rollback state is owned by the step.** The version-bump step snapshots
the original file bytes; the GitHub step remembers the created release ID;
steps use interior mutability (`RwLock`, atomics) since the orchestrator
holds them behind shared references.
- **Progress streaming.** Long-running steps (Docker build/push, rollout
wait, health polling) push live messages through a `ProgressReporter` in
the context, rendered as terminal spinners.
---
## 4. The orchestrator and the RunReport
```rust
let report: RunReport = orchestrator.run().await;
// report.steps: Vec<(name, StepOutput)> — includes the failed step
// report.error: Option<ApiForgeError>
// report.rolled_back: bool
```
The orchestrator never hides partial progress. On failure it:
1. records the failing step's sanitized error,
2. rolls back completed steps in reverse order (a failed rollback of one step
is logged and does not stop the others),
3. returns the full report.
The CLI then writes an audit record with real status
(`success` / `failed` / `rolled_back`), per-step durations, and the error —
and sends notifications for both outcomes.
---
## 5. Rollback semantics
| version-bump | Restore the original file bytes (preserves unrelated local edits) |
| changelog | `git checkout CHANGELOG.md` |
| git-commit | Soft reset to parent (changes stay staged) |
| git-tag | Delete tag |
| git-push | Delete remote + local tag; **commit history is never force-rewritten** |
| k8s-update | Revert deployment to previous ReplicaSet revision (like `kubectl rollout undo`) |
| k8s-rollout | No-op — the update step owns the revert (prevents double-rollback to N−2) |
| github-release | Delete the created release |
Why no force-push on rollback: once a commit reaches a shared remote, peers
and CI may have consumed it. The version-bump commit is harmless on its own —
without a tag it is not a release, no image is published for it, and the next
successful release supersedes it.
---
## 6. Smart rollback (`apiforge rollback`)
Without `--to`, the target is auto-detected:
1. Read the **currently deployed version** from the Kubernetes deployment's
image tag.
2. Candidates: successful, non-dry-run releases from the **audit history**
(newest first); fall back to **semver git tags** when no history exists.
3. Pick the newest candidate **strictly older** than the deployed version.
4. Confirm (unless `--yes`), patch the deployment, wait for rollout, then
**re-run the configured health check** — a rollback that leaves the
service unhealthy exits non-zero.
`--dry-run` previews the decision without cluster access.
---
## 7. Reliability engineering
- **Timeouts** wrap every network-prone git operation (libgit2 calls run in
blocking threads under `tokio::time::timeout`); fetch/push/operation
budgets are configurable per project.
- **Retries with exponential backoff + jitter** wrap AWS, Kubernetes, and
GitHub calls; error classifiers distinguish transient failures (throttling,
5xx, timeouts) from permanent ones (auth, not-found) so the latter fail
immediately.
- **Secret redaction** scrubs AWS account IDs/access keys/ARNs, GitHub
tokens, URL tokens, and basic-auth credentials from every error path —
including audit metadata and notification text.
- **Bounded audit storage**: sled DB capped at 10k records with pruning and
compaction; writes are retried on transient I/O errors and flushed
explicitly.
- **Env-var resolution at load time**: `${GITHUB_TOKEN}` style references in
secret fields resolve when the config is read; a missing variable fails
fast with its name.
---
## 8. UX details
- **Live spinners** (interactive terminals only) show the running step with
streaming detail: Docker build output lines, `docker-push: tag 2/3`,
`k8s-rollout: 4/5 replicas ready`, `health-check: attempt 3 (12s of 60s)`.
- **CI-safe degradation**: pipes and CI logs get the exact plain line format;
no ANSI spinner frames ever leak.
- **Machine output**: `--output json` keeps stdout pure JSON (plan and
progress go to stderr) — pipe straight into `jq`.
- **Guard rails**: dirty tree, wrong branch, ahead/behind remote, and
duplicate tags are all blocked in preflight with precise messages.
- `apiforge doctor` checks tooling; `apiforge config validate --verbose`
runs layered config diagnostics (file → TOML → schema → field semantics).
---
## 9. Language support
The version step reads/writes the native version file per ecosystem:
| Rust | `Cargo.toml` | `toml_edit` (format-preserving) |
| Node | `package.json` | serde JSON (pretty, trailing newline) |
| Python | `pyproject.toml` | Poetry **and** PEP 621 layouts |
| Go | `version.go` / `go.mod` | `Version = "x"` var or comment fallback |
| Java | `pom.xml` | project-level `<version>` (skips `${properties}`) |
---
## 10. Quality gates
- 74 unit/integration tests (orchestrator rollback ordering, audit retention,
retry edge cases, config env resolution, changelog ordering, rollback
target selection, version file round-trips)
- End-to-end CLI matrix: 52 scenario tests across every command and flag,
including failure paths, non-TTY hang checks, and JSON purity
- `clippy -D warnings`, rustfmt, `cargo audit` (documented transitive
ignores), criterion benchmarks
- CI: test matrix on Linux/macOS/Windows × stable + MSRV 1.91.1; release
workflow builds 5 targets and publishes to crates.io + GitHub Releases
---
## 11. Extending ApiForge
Adding a step:
1. Create a module under `src/steps/`.
2. Implement `Step` (name, description, validate, execute, dry_run, and
rollback if the step mutates anything).
3. Register it in `src/steps/mod.rs` and wire it into the pipeline in
`main.rs`.
The trait keeps concerns local: a step never needs to know about ordering,
rollback sequencing, output rendering, or audit recording.