api-debug-lab 0.4.0

Reproducible API troubleshooting fixtures and a Rust diagnostic CLI.
Documentation

API Support Debug Lab

Open in Colab

A reproducible developer-support debugging lab. The repository contains intentionally-failing API request fixtures (request, response, server log) and a Rust diagnostic CLI that classifies the failure, recomputes the relevant evidence, and emits next steps and a draft escalation note.

The artefact answers a single question: given the same fixtures a real support engineer might receive, how does the candidate reproduce, classify, and communicate?

At a glance

$ api-debug-lab list-cases | wc -l        # bundled positive fixtures
14
$ cargo test --tests 2>&1 | grep "test result: ok" | wc -l
9                                          # nine independent test groups, all green
$ cargo llvm-cov --summary-only | tail -1
TOTAL  ...  92.38 % regions covered
$ cargo mutants --in-place --file src/rules.rs --no-shuffle | tail -1
183 mutants tested in 3m: 15 missed, 154 caught, 14 unviable   # 91 % kill rate

Eight rules, fourteen bundled positive fixtures, eleven bundled negative fixtures, three real-API webhook envelopes (Stripe v1, Slack v0, GitHub HMAC), a 36-case Brier-calibrated confidence corpus with a regression canary, oracle HMAC tests pinned to externally-computed reference vectors, single-digit-microsecond per-case latency, 93 % line coverage, 91 % mutation kill rate, ~90 tests across nine groups, three ADRs documenting the design choices.

Money shot

$ api-debug-lab diagnose auth_missing
CASE: auth_missing
SEVERITY: medium
LIKELY CAUSE: Missing Authorization header
CONFIDENCE: 0.95
RULE: auth_missing

EVIDENCE:
- Authorization header absent in request
- Endpoint POST https://api.acme-co.example/v1/events flagged auth_required=true
- Response status 401 Unauthorized

REPRODUCTION:
curl -X POST https://api.acme-co.example/v1/events \
  -H "content-type: application/json" \
  -H "user-agent: acme-client/0.4.1" \
  --data-raw '{"event":"order.created","order_id":"ord_8KZ"}'

NEXT STEPS:
1. Add an Authorization: Bearer <token> header to the request.
2. Confirm the token has not expired.
3. Verify the token's scope covers the requested operation.

ESCALATION NOTE:
Customer request failed because the Authorization header was absent.
The API rejected the request before payload processing. Ask the customer
to retry with a valid bearer token and confirm the token's scope.

Arbitration in action

When more than one rule fires, the highest-confidence diagnosis is reported as primary and the rest as also_considered. Tie-breaks are alphabetical on rule_id so output is byte-stable.

$ api-debug-lab diagnose webhook_signature_invalid_stale
LIKELY CAUSE: Webhook signature does not match recomputed HMAC
CONFIDENCE: 0.92
RULE: webhook_signature_mismatch
...
ALSO CONSIDERED:
- webhook_timestamp_stale (confidence 0.90): Webhook timestamp outside tolerance window

HMAC mismatch is rated higher than timestamp drift because a wrong digest is dispositive (secret, body, or timestamp prefix differs) while clock skew has benign causes. The confidence rubric is documented in docs/confidence_model.md.

What this demonstrates

  • API troubleshooting against fixtures that look like real support tickets.
  • Log + header inspection with structural checks (HMAC recompute, JSON parse, rate-limit header math, hostname Hamming distance, idempotency body-hash comparison) — not just status-code grep.
  • Confidence-ranked arbitration when multiple rules fire, with a documented confidence model and a Brier-score calibration test.
  • Production realism: Stripe-style multi-version webhook envelopes, JSON-lines log auto-detection, RFC3339 timestamp derivation, partial- outage interleaved request streams.
  • A Rust CLI with clap derive, snapshot tests via insta, property-based tests via proptest, end-to-end CLI tests via assert_cmd, and criterion benchmarks.
  • Honest scope: rule-based, eight failure modes, no machine learning, no network calls, no telemetry.

Quick start

cargo install api-debug-lab
api-debug-lab list-cases
api-debug-lab diagnose auth_missing
api-debug-lab diagnose webhook_signature_invalid_stale --format json | jq

Requires a stable Rust toolchain (1.78+). No service to start, no Docker. The bundled fixtures are embedded in the installed binary; pass --fixtures <dir> to diagnose a local fixture directory instead.

From a source checkout:

cargo run -- list-cases
cargo run -- diagnose auth_missing
cargo run -- corpus fixtures/cases | tail -25
cargo test

Failure cases

Case Structural signal beyond status code
auth_missing Authorization header literally absent on an auth_required route
bad_json_payload serde_json parser error and byte offset on the request body
rate_limited Retry-After, X-RateLimit-Remaining, X-RateLimit-Reset parsed
webhook_signature_invalid HMAC-SHA256 recomputed over "{ts}.{body}" and compared
webhook_signature_invalid_stale Ambiguous: signature mismatch and timestamp drift; both rules fire
webhook_stripe_v1 Stripe envelope t=,v1=,v0= parsed; multi-version HMAC compare
webhook_slack_v0 Slack envelope v0=; HMAC over "v0:{ts}:{body}"
webhook_github_hmac GitHub sha256=; HMAC over the raw body (no timestamp)
timeout_retry Log lines grouped by request id; elapsed derived from RFC3339 stamps
timeout_retry_jsonl Same shape but server log is JSON-lines (per-line auto-detect)
timeout_retry_partial_outage Two request_ids interleaved; rule isolates the worst offender
timeout_retry_midnight_rollover Log spans midnight UTC; elapsed derived correctly across day boundary
config_dns_error URL host parsed; near-miss (Hamming-distance) and TLD checks
idempotency_collision Recomputed body SHA-256 compared against the stored hash

Negatives that look similar but should not classify live under fixtures/cases/_negatives/ — one per rule, including a webhook_clean with a valid HMAC, a webhook_stripe_v1_clean with a valid Stripe v1 envelope, a 401 from an unrelated upstream call, and an idempotency-clean retry with byte-identical body.

CLI

api-debug-lab list-cases                          # bundled fixtures
api-debug-lab diagnose <name|path>                # human report
api-debug-lab diagnose <name> --format json       # machine-readable
api-debug-lab diagnose <name> --trace             # per-rule timing on stderr
api-debug-lab explain  <name>                     # rule + evidence pointers
api-debug-lab replay   <name>                     # curl repro + diagnosis
api-debug-lab report   <name>                     # alias for human diagnose
api-debug-lab corpus   <dir>                      # sweep an arbitrary dir
api-debug-lab corpus   <dir> --ndjson             # one JSON object per line

Exit codes: 0 diagnosed (confidence ≥ 0.60), 1 unclassified or low-confidence, 2 bad input.

Architecture

fixtures/cases/<name>/case.json       →  Case (serde, schema-validated)  ┐
fixtures/cases/<name>/server.log      →  &str (lazy load, JSONL or text) ├→ Rule[] → Diagnosis[] → Report
fixtures/cases/<name>/secret.txt      →  Vec<u8>                         ┘   (sorted by
                                                                              confidence desc)
  • src/cases.rsCase, loader, schema-validated by fixtures/cases.schema.json (JSON Schema Draft 2020-12).
  • src/rules.rsRule trait, eight rule impls, diagnose, diagnose_traced, per-line LogLine JSONL/text autodetect, RFC3339 timestamp derivation.
  • src/evidence.rsEvidence + Pointer (source + optional log line).
  • src/report.rs — human / JSON formatters; deterministic curl reproduction.
  • src/main.rsclap subcommands, exit codes, --trace, corpus sweep.

All output is byte-stable across machines: header iteration uses BTreeMap, reproductions inline the body (no absolute paths), no system clock or RNG.

Tests and benchmarks

cargo test --all-targets               # ~70 tests
cargo bench --bench diagnose -- --quick # microsecond per-case latency
cargo deny check                        # supply-chain (CI)
cargo audit                             # advisories  (CI)

Test coverage:

  • Per-rule unit tests (tests/rules.rs) — positive + paired negative for each rule.
  • Snapshot tests (tests/snapshots.rs) — human and JSON renders of every fixture, pinned via insta.
  • Property-based tests (tests/properties.rs) — proptest invariants: no rule panics on any schema-valid case; diagnose is idempotent; confidence is finite and in [0, 1]; also_considered is sorted descending and below the primary; hand-written adversarial fixtures (1 MiB body, NUL byte, far-future timestamp, extreme URL).
  • Schema validation (tests/schema.rs) — every bundled case.json validates against fixtures/cases.schema.json.
  • CLI integration (tests/diagnose_cli.rs) — assert_cmd end-to-end for each subcommand, including a tempdir test for corpus over a copied fixture.
  • Confidence calibration (tests/calibration.rs) — five distinct properties over the labelled corpus (expected_rule_id labels embedded in each enrolled case.json, 36 cases × 8 rules = 288 (case, rule) pairs): aggregate Brier ≤ 0.05, per-rule Brier ≤ 0.08, ECE ≤ 0.05, 100 % primary-classification accuracy, unclassified cases below threshold. Rubric in docs/confidence_model.md.
  • Calibration regression canary (tests/calibration_regression.rs) — a feature-gated test (--features calibration_canary) that simulates a deliberately-miscalibrated rule and asserts the production Brier check would have caught it. Runs in a separate CI job. Proves the calibration framework is load-bearing rather than ceremonial.
  • Oracle / differential HMAC tests (tests/oracle.rs) — three reference signatures hand-computed via openssl and pinned in source. Catches future signing-input drift that self-consistent round-trip tests would miss.
  • Latency-budget regression test (tests/latency_budget.rs) — asserts per-rule median wall-clock evaluation is below 100 µs across 200 iterations per fixture.
  • Mutation testing report (docs/mutation_report.md) — 91 % kill rate over 169 viable mutants in src/rules.rs, surviving mutants classified (gap / benign / equivalent).
  • Code coverage snapshot (docs/coverage.md) — 92.4 % regions, 91.7 % functions, 93.0 % lines via cargo-llvm-cov, date- and rustc-version-stamped.

Snapshot updates after intentional output changes:

cargo install cargo-insta
cargo insta review

Benchmarks run in --quick mode under a second; full mode under ten seconds. Baseline numbers and methodology in docs/benchmarks.md.

Limitations and scope

This is a demo / portfolio artefact. Honest limits:

  • Rule-based, not learned. Eight hand-written rules; no ML, no model. The confidence rubric is documented in docs/confidence_model.md and held accountable by aggregate Brier, per-rule Brier, ECE, and a feature-gated regression canary in tests/calibration*.rs.
  • Eight failure modes. Real support traffic has many more; the rule set is illustrative.
  • Synthetic fixtures. Modelled on the shape of production logs and the documented envelope formats of Stripe v1, Slack v0, and GitHub HMAC, but invented; no real customer data.
  • No network. Reproductions print curl for the reviewer to run by hand; the binary itself never opens a socket.
  • No background service. The lab is the binary plus the fixtures.

License

Apache-2.0. See LICENSE.