atproto-devtool 0.1.1

A multitool for the atproto developer ecosystem
Documentation
# test labeler

Last verified: 2026-04-21

## Purpose

Implements `atproto-devtool test labeler <target>`, a conformance suite that
validates an atproto labeler across five stages — identity, HTTP,
subscription, crypto, and report — and produces a structured report plus an
exit code (0 if all spec-required checks pass, 1 on spec violations, 2 on
network failures only). Each stage is built around an injected I/O seam so
integration tests can replay fixtures instead of talking to real servers.

## Contracts

- **Public entry points**:
  - `LabelerCmd::run(no_color) -> Result<ExitCode, miette::Report>` (in
    `labeler.rs`) — constructs the shared reqwest client and calls
    `pipeline::run_pipeline`.
  - `pipeline::parse_target(raw, explicit_did) -> LabelerTarget` — the
    accepted target grammar is handle, `did:*`, `https://` URL, or
    `http://` URL with a local hostname (loopback, RFC 1918, `.local`).
    Remote HTTP is rejected with a helpful error; raw endpoints with
    no DID simply skip identity/crypto.
  - `pipeline::run_pipeline(target, LabelerOptions) -> LabelerReport` — the
    one orchestrator that every test hits.
- **Per-stage entry points**: `identity::run`, `http::run`,
  `subscription::run`, `crypto::run`, `create_report::run`. Each returns a
  `*StageOutput` with an `Option<*Facts>` (populated only when the stage
  succeeds enough to let downstream stages run, or `None` when there are no
  meaningful facts to carry forward) plus a `Vec<CheckResult>`.
- **Report shape**: `report::{LabelerReport, CheckResult, CheckStatus,
  Stage, SummaryCounts, ReportHeader, RenderConfig}`. `Stage` has five
  variants — `Identity`, `Http`, `Subscription`, `Crypto`, `Report` — and
  derives `Ord` in that declaration order, which is the rendering order
  (`Report` is always last). Five-way `CheckStatus`: `Pass`,
  `SpecViolation`, `NetworkError`, `Advisory`, `Skipped`. Exit code
  semantics: `1` if any `SpecViolation` is recorded; else `2` if any
  `NetworkError` is recorded; else `0`. `SpecViolation` takes precedence
  over `NetworkError` so that a conformance bug is never masked by an
  unrelated reachability failure. `Advisory` and `Skipped` never influence
  the exit code.
- **Check IDs are stable strings** (e.g. `"identity::target_resolved"`,
  `"http::first_page_decodes"`, `"crypto::rollup"`,
  `"report::self_mint_accepted"`). They appear verbatim in insta snapshots
  under `tests/snapshots/`; renaming one is a breaking change to the CLI
  output contract.
- **Diagnostic codes are stable strings** (e.g.
  `"labeler::identity::labeler_endpoint_parseable"`,
  `"labeler::report::contract_missing"`). Same deal — snapshots pin them.
- **Report-stage surface**: `create_report::{Check, CheckFactsOutput,
  CreateReportFacts, CreateReportStageOutput, CreateReportStageError,
  CreateReportTee, RealCreateReportTee, RawCreateReportResponse,
  PdsXrpcClient, RealPdsXrpcClient, RawPdsXrpcResponse,
  PdsJwtFetcher, PdsProxiedPoster, CreateReportRunOptions,
  XrpcErrorEnvelope, RejectionShape, ResponseOrigin}` plus per-check diagnostic structs
  (`ContractMissing`, `UnauthenticatedAccepted`, `MalformedBearerAccepted`,
  `WrongAudAccepted`, `WrongLxmAccepted`, `ExpiredAccepted`, `ShapeNot400`,
  `SelfMintRejected`, `PdsServiceAuthRejected`, `PdsProxiedRejected`).
  `Check::ORDER` is the canonical 10-element iteration order: contract,
  unauth, malformed, wrong-aud, wrong-lxm, expired, rejected-shape,
  self-mint, pds-service-auth, pds-proxied. Report stage always emits
  exactly 10 rows regardless of gating — missing identity facts collapse
  to 10 `Skipped` rows so row count and order are invariant.

## Dependencies

- **Uses**: `crate::common::identity` for every network hop and DID
  primitive (including `AnySigningKey`, `encode_multikey`,
  `is_local_labeler_hostname`). `crate::common::jwt` for hand-rolled
  compact JWS encoding used by the report stage. `atrium-api` for labeler
  record + queryLabels types (we go through `serde_json` + atrium types,
  never through `atrium-xrpc-client`). `reqwest` and `tokio-tungstenite`
  only via the `RealHttpTee`, `RealWebSocketClient`, `RealCreateReportTee`,
  and `RealPdsXrpcClient` seams.
- **Used by**: `crate::cli` wires this into the clap command tree; nothing
  else depends on it.
- **Boundary**: Stage modules talk to each other only through `*Facts`
  structs passed by `pipeline::run_pipeline`. A stage must not import
  another stage's internals. The report stage reads
  `IdentityFacts::{reason_types, subject_types, subject_collections}` that
  identity is responsible for populating from the labeler record.

## Key decisions

- **Every I/O boundary is a trait**: `HttpClient` + `DnsResolver` from
  `common::identity`, plus stage-local `RawHttpTee` (HTTP stage),
  `WebSocketClient` / `FrameStream` (subscription stage), `CreateReportTee`
  (report stage, POST with optional bearer), and `PdsXrpcClient` (report
  stage, POST/GET against the user's PDS with optional bearer and
  `atproto-proxy` headers). All are injectable through `LabelerOptions`.
  The CLI passes real clients; tests pass fakes from `tests/common/mod.rs`.
- **Shared reqwest client**: `LabelerCmd::run` builds one reqwest client
  with rustls + 10s timeout + user-agent and threads it through every
  stage. Do not construct fresh clients inside stages.
- **Two-connection subscription strategy**: the subscription stage tries
  to observe backfill via an idle-gap heuristic, then on `ExceededBudget`
  or `StreamClosedDuringBackfill` reconnects live-tail to distinguish a
  healthy long backfill from a stuck stream. Outcome is captured in the
  `BackfillOutcome` / `LiveTailOutcome` enums.
- **Crypto stage falls back to PLC history**: if the current signing key
  fails to verify a label, `did:plc` targets retry against historic keys
  from the PLC audit log (`plc_history_for_fragment`). `did:web` has no
  history, so a failure there is a hard `SpecViolation`. Verification that
  only succeeds against a historic key still passes the stage but emits an
  `Advisory`.
- **Crypto stage skips local labelers with mismatched signing keys**: when
  the labeler endpoint is local (per `is_local_labeler_hostname`) and at
  least one label fails verification against the DID-document signing
  key, `crypto::rollup` is `Skipped` rather than `SpecViolation`. The
  rationale is that developers testing a local copy of a labeler will
  typically not have the production signing key present, so a mismatch is
  expected; PLC history fallback is also skipped in this case. If the
  local labeler's key happens to match the published key, verification
  proceeds normally (`Pass`).
- **Identity stage downgrades local endpoint mismatches to Advisory**:
  when the user supplies `--target http://<local>:<port> --did <prod-did>`
  and the DID document advertises a different (production) endpoint,
  `identity::resolved_did_matches_flag` emits `Advisory` rather than
  `SpecViolation`, and `IdentityFacts.labeler_endpoint` is overridden to
  the local URL so HTTP / subscription / report stages all target the
  local copy. Without this override the `block_facts = true` branch would
  skip the report stage entirely, which is the opposite of what the
  developer wanted. Remote-URL mismatches remain `SpecViolation`.
- **DRISL-CBOR canonicalization for label signing**: crypto stage
  implements the deterministic CBOR canonicalization in
  `canonicalize_label_for_signing` rather than pulling a library — label
  signing uses a specific sort order and tag encoding that no existing
  crate matches. See `crypto.rs` for the spec.
- **Every check always emits a result**: stages never short-circuit on the
  first failure. When a prerequisite fails, downstream checks in the same
  stage still emit `Skipped` rows with a reason. When a
  whole stage is blocked upstream, the pipeline emits one
  `<stage>::not_run` row per downstream stage with a reason string.
- **Identity facts gate downstream stages**: HTTP and crypto stages only
  run when identity populated `IdentityFacts`. Subscription can run from
  an explicit endpoint URL without identity (the
  `LabelerTarget::Endpoint { did: None }` path).
- **Crypto pulls labels from HTTP and/or subscription**: the crypto stage
  runs when identity succeeded and *either* the HTTP stage produced
  `HttpFacts` *or* the subscription stage collected at least one
  `sample_labels` entry. Labels from both sources are concatenated before
  verification so a JSON-decoded `queryLabels` page and a CBOR-decoded
  `subscribeLabels` frame are both exercised. Subscription samples are
  capped at `subscription::SAMPLE_LABEL_CAP` to bound memory on noisy
  streams.
- **Report stage runs last and always emits 10 rows**: the report stage
  is ordered after crypto because it exercises write-side conformance
  (authenticated `createReport`), not observational conformance. The
  stage's output row count is a hard invariant — missing identity facts,
  missing contract, or absent self-mint / PDS inputs all collapse to
  `Skipped` rows rather than fewer rows. `Check::ORDER` is the frozen
  iteration sequence.
- **`--commit-report` is the write-side opt-in**: without it the stage
  still emits all 10 rows (mostly `Skipped`), but it will not POST
  authenticated report bodies to the labeler. `ContractPublished` without
  `--commit-report` is a stage-skip; with `--commit-report`, missing
  `reasonTypes` / `subjectTypes` becomes a `SpecViolation` gating the
  rest of the stage.
- **Self-mint only runs for locally-reachable labelers by default**:
  `is_local_labeler_hostname` classifies the labeler endpoint; non-local
  hosts skip all self-mint checks because the tool's local did:web doc
  server can't be reached from a public labeler. `--force-self-mint`
  overrides the heuristic. The `SelfMintSigner` (owner of the ephemeral
  did:web HTTP server + signing key) is constructed pessimistically in
  `LabelerCmd::run` so the stage can skip cheaply when locality fails.
- **PDS-mediated modes are credentials-gated**: the
  `pds_service_auth_accepted` and `pds_proxied_accepted` checks require
  `--handle` + `--app-password` (enforced as a symmetric clap `requires`).
  The pipeline constructs a `RealPdsXrpcClient` only when credentials are
  present; otherwise the checks emit `Skipped` with a reason.
- **PDS client targets the reporter's PDS, not the labeler's**:
  `createSession` / `getServiceAuth` must be dispatched against the PDS
  that actually hosts the reporter's account. The pipeline resolves
  `--handle` (via `resolve_handle``resolve_did``find_service` for
  `#atproto_pds`) and uses the resulting URL for `RealPdsXrpcClient`.
  Using `IdentityFacts::pds_endpoint` (the labeler's PDS) would surface
  as `InvalidToken: Token could not be verified` when reporter and
  labeler live on different PDS shards. Resolution failures flatten to a
  string and ride through `CreateReportRunOptions::pds_resolution_error`
  so both PDS-mediated rows surface a `NetworkError` with a specific
  message instead of a silent `Skipped`.
- **PDS-mediated failures split by origin, not by variant**: a single
  `PdsServiceAuthRejected` / `PdsProxiedRejected` diagnostic carries a
  `ResponseOrigin` field so labeler-side rejections classify as
  `SpecViolation` while PDS-side rejections (`getServiceAuth` refused;
  proxy rejected before forwarding) classify as `NetworkError`. Keeping
  one variant per check preserves the one-diagnostic-per-check shape the
  report stage documents, and the Mode-3 upstream-envelope heuristic
  (`UpstreamError` / `UpstreamFailure` / 502 / 504) is what
  discriminates the two origins on the PDS-proxied path.
- **Sentinel reason string and run-id**: every committed report body
  carries a sentinel reason built by `create_report::sentinel::build`
  that encodes the run-id (16 hex chars from `getrandom`) and an RFC 3339
  UTC timestamp formatted by hand. This makes accidental reports easy to
  filter out of moderation queues. The run-id is generated once in
  `LabelerCmd::run` and threaded through `CreateReportRunOptions::run_id`.

## Invariants

- Every `CheckResult` with `status == SpecViolation` carries a `diagnostic`
  with a non-empty `#[source_code]` — the report renderer uses miette's
  `GraphicalReportHandler` and a missing source span degrades output.
- `LabelerReport::exit_code` returns `1` if any `SpecViolation` is
  recorded, `2` if not but at least one `NetworkError` is recorded,
  and `0` otherwise. Advisories and skipped checks never fail the run.
- The report stage always records exactly 10 `report::*` rows in
  `Check::ORDER` order. Tests (`labeler_report::ac7_1_row_count`,
  `labeler_report::ac7_2_canonical_order`) pin this. Any future check
  addition/removal is a wire contract change.
- Snapshot tests under `tests/snapshots/` are part of the contract. Any
  check ID, diagnostic code, or rendered line change must be accompanied
  by a reviewed `cargo insta review`. Rendered `elapsed: Xms` lines are
  normalized by `tests::common::normalize_timing` so per-run timing does
  not churn snapshots.
- The pipeline never calls `reqwest::Client::new()` or constructs a
  tokio-tungstenite connection outside of `Real*` seam structs.

## Key files

- `labeler.rs` — clap args, `LabelerCmd::run`, CLI bootstrap. New flags:
  `--commit-report`, `--force-self-mint`, `--self-mint-curve`,
  `--report-subject-did`, `--handle`, `--app-password` (the last two are
  symmetrically `requires`-bound).
- `pipeline.rs``LabelerTarget`, `LabelerOptions` (now carries
  `create_report_tee`, `commit_report`, `force_self_mint`,
  `self_mint_curve`, `report_subject_override`, `self_mint_signer`,
  `pds_credentials`, `pds_xrpc_client` / `pds_xrpc_client_override`,
  `run_id`), `CreateReportTeeKind`, `PdsCredentials`, `parse_target`,
  `run_pipeline` orchestration.
- `report.rs``CheckStatus`, `CheckResult`, `LabelerReport`,
  `RenderConfig`, `Stage` (now 5 variants ending in `Report`), rendering
  via `miette::GraphicalReportHandler`.
- `identity.rs` — identity stage: DID resolution, labeler record fetch
  (through `atrium-api` types over the `HttpClient` seam), policy
  validation. `IdentityFacts` now also carries
  `reason_types` / `subject_types` / `subject_collections` extracted
  from the labeler record so the report stage can check contract shape
  without re-parsing.
- `http.rs` — HTTP stage: `RawHttpTee` trait, `RealHttpTee` reqwest
  implementation, first-page / pagination / cursor checks against
  `com.atproto.label.queryLabels`.
- `subscription.rs` — WebSocket stage: `WebSocketClient` / `FrameStream`
  traits, CBOR frame decoder, two-connection backfill / live-tail logic.
- `crypto.rs` — label canonicalization, signature verification, PLC key
  history fallback.
- `create_report.rs` — report stage entry point `create_report::run`,
  `CreateReportTee` + `RealCreateReportTee` seam, `PdsXrpcClient` +
  `RealPdsXrpcClient` seam, `PdsJwtFetcher` and `PdsProxiedPoster`
  helpers, `Check` enum (10 variants with stable IDs and
  `Check::ORDER`), diagnostic structs (`ContractMissing` + 9 per-check
  accepted/rejected variants), `XrpcErrorEnvelope` + `RejectionShape`
  classifier for 401 envelope checks, `build_minimal_report_body`.
- `create_report/sentinel.rs` — sentinel reason-string builder, RFC 3339
  UTC formatter, `new_run_id` (16-hex-char run identifier via
  `getrandom`).
- `create_report/did_doc_server.rs``DidDocServer`: an RAII
  127.0.0.1:0-bound HTTP/1.1 server that serves a one-shot did:web
  document so the labeler can resolve our self-mint identity.
- `create_report/self_mint.rs``SelfMintSigner` (owns signing key +
  `DidDocServer`) and `SelfMintCurve` clap `ValueEnum` (`es256`,
  `es256k`).
- `create_report/pollution.rs` — pollution-avoidance helpers
  `choose_reason_type` and `choose_subject` so committing checks never
  submit plausible moderation content.

## Gotchas

- `LabelerTarget::Endpoint { did: None }` runs HTTP and subscription but
  skips identity, crypto, and report. Emitting those as "blocked" rather
  than "skipped — no DID supplied" is a regression.
- `RawHttpTee::query_labels(cursor)` must NOT duplicate the first-page
  request for reachability — the stage previously pinged before the real
  request, doubling traffic against real servers.
- `CheckResult::diagnostic` is `Option<Box<dyn Diagnostic + Send + Sync>>`
  — when you add a new failure case, wire the diagnostic through the
  whole way. Snapshots will expose "diagnostic: None" as a rendered
  blank block if you forget.
- The CLI pipes through `tracing` with `EnvFilter``--verbose` toggles
  `DEBUG`. Per-stage instrumentation is load-bearing for the
  `verbose_flag_accepted` CLI test.
- Fixture layout under `tests/fixtures/labeler/<stage>/<case>/` is
  referenced by test helper `gen_fixtures` anchored to
  `CARGO_MANIFEST_DIR`. Empty case directories need a `.gitkeep`.
- The report stage's `SelfMintSigner::spawn` binds a TCP port and starts
  a tokio task; it's constructed only when needed (local endpoint or
  `--force-self-mint`) to avoid orphaning a server on every run. Do not
  move the allocation earlier in `LabelerCmd::run`.
- PDS-mediated modes create a `createSession` exactly once per run; the
  resulting access JWT is reused across `getServiceAuth` and the
  proxied POST. An `I2`-style regression (double `createSession`) is
  visible through `FakePdsXrpcClient::request_count` in tests.
- `normalize_timing` in `tests/common/mod.rs` rewrites the `elapsed: Xms`
  footer to a fixed token before snapshot comparison. Report-stage
  integration tests depend on this — do not write report-stage snapshots
  without going through it.