# test labeler
Last verified: 2026-04-21
## Purpose
Implements `atproto-devtool test labeler <target>`, a conformance suite that
validates an atproto labeler across five stages — identity, HTTP,
subscription, crypto, and report — and produces a structured report plus an
exit code (0 if all spec-required checks pass, 1 on spec violations, 2 on
network failures only). Each stage is built around an injected I/O seam so
integration tests can replay fixtures instead of talking to real servers.
## Contracts
- **Public entry points**:
- `LabelerCmd::run(no_color) -> Result<ExitCode, miette::Report>` (in
`labeler.rs`) — constructs the shared reqwest client and calls
`pipeline::run_pipeline`.
- `pipeline::parse_target(raw, explicit_did) -> LabelerTarget` — the
accepted target grammar is handle, `did:*`, `https://` URL, or
`http://` URL with a local hostname (loopback, RFC 1918, `.local`).
Remote HTTP is rejected with a helpful error; raw endpoints with
no DID simply skip identity/crypto.
- `pipeline::run_pipeline(target, LabelerOptions) -> LabelerReport` — the
one orchestrator that every test hits.
- **Per-stage entry points**: `identity::run`, `http::run`,
`subscription::run`, `crypto::run`, `create_report::run`. Each returns a
`*StageOutput` with an `Option<*Facts>` (populated only when the stage
succeeds enough to let downstream stages run, or `None` when there are no
meaningful facts to carry forward) plus a `Vec<CheckResult>`.
- **Report shape**: `report::{LabelerReport, CheckResult, CheckStatus,
Stage, SummaryCounts, ReportHeader, RenderConfig}`. `Stage` has five
variants — `Identity`, `Http`, `Subscription`, `Crypto`, `Report` — and
derives `Ord` in that declaration order, which is the rendering order
(`Report` is always last). Five-way `CheckStatus`: `Pass`,
`SpecViolation`, `NetworkError`, `Advisory`, `Skipped`. Exit code
semantics: `1` if any `SpecViolation` is recorded; else `2` if any
`NetworkError` is recorded; else `0`. `SpecViolation` takes precedence
over `NetworkError` so that a conformance bug is never masked by an
unrelated reachability failure. `Advisory` and `Skipped` never influence
the exit code.
- **Check IDs are stable strings** (e.g. `"identity::target_resolved"`,
`"http::first_page_decodes"`, `"crypto::rollup"`,
`"report::self_mint_accepted"`). They appear verbatim in insta snapshots
under `tests/snapshots/`; renaming one is a breaking change to the CLI
output contract.
- **Diagnostic codes are stable strings** (e.g.
`"labeler::identity::labeler_endpoint_parseable"`,
`"labeler::report::contract_missing"`). Same deal — snapshots pin them.
- **Report-stage surface**: `create_report::{Check, CheckFactsOutput,
CreateReportFacts, CreateReportStageOutput, CreateReportStageError,
CreateReportTee, RealCreateReportTee, RawCreateReportResponse,
PdsXrpcClient, RealPdsXrpcClient, RawPdsXrpcResponse,
PdsJwtFetcher, PdsProxiedPoster, CreateReportRunOptions,
XrpcErrorEnvelope, RejectionShape, ResponseOrigin}` plus per-check diagnostic structs
(`ContractMissing`, `UnauthenticatedAccepted`, `MalformedBearerAccepted`,
`WrongAudAccepted`, `WrongLxmAccepted`, `ExpiredAccepted`, `ShapeNot400`,
`SelfMintRejected`, `PdsServiceAuthRejected`, `PdsProxiedRejected`).
`Check::ORDER` is the canonical 10-element iteration order: contract,
unauth, malformed, wrong-aud, wrong-lxm, expired, rejected-shape,
self-mint, pds-service-auth, pds-proxied. Report stage always emits
exactly 10 rows regardless of gating — missing identity facts collapse
to 10 `Skipped` rows so row count and order are invariant.
## Dependencies
- **Uses**: `crate::common::identity` for every network hop and DID
primitive (including `AnySigningKey`, `encode_multikey`,
`is_local_labeler_hostname`). `crate::common::jwt` for hand-rolled
compact JWS encoding used by the report stage. `atrium-api` for labeler
record + queryLabels types (we go through `serde_json` + atrium types,
never through `atrium-xrpc-client`). `reqwest` and `tokio-tungstenite`
only via the `RealHttpTee`, `RealWebSocketClient`, `RealCreateReportTee`,
and `RealPdsXrpcClient` seams.
- **Used by**: `crate::cli` wires this into the clap command tree; nothing
else depends on it.
- **Boundary**: Stage modules talk to each other only through `*Facts`
structs passed by `pipeline::run_pipeline`. A stage must not import
another stage's internals. The report stage reads
`IdentityFacts::{reason_types, subject_types, subject_collections}` that
identity is responsible for populating from the labeler record.
## Key decisions
- **Every I/O boundary is a trait**: `HttpClient` + `DnsResolver` from
`common::identity`, plus stage-local `RawHttpTee` (HTTP stage),
`WebSocketClient` / `FrameStream` (subscription stage), `CreateReportTee`
(report stage, POST with optional bearer), and `PdsXrpcClient` (report
stage, POST/GET against the user's PDS with optional bearer and
`atproto-proxy` headers). All are injectable through `LabelerOptions`.
The CLI passes real clients; tests pass fakes from `tests/common/mod.rs`.
- **Shared reqwest client**: `LabelerCmd::run` builds one reqwest client
with rustls + 10s timeout + user-agent and threads it through every
stage. Do not construct fresh clients inside stages.
- **Two-connection subscription strategy**: the subscription stage tries
to observe backfill via an idle-gap heuristic, then on `ExceededBudget`
or `StreamClosedDuringBackfill` reconnects live-tail to distinguish a
healthy long backfill from a stuck stream. Outcome is captured in the
`BackfillOutcome` / `LiveTailOutcome` enums.
- **Crypto stage falls back to PLC history**: if the current signing key
fails to verify a label, `did:plc` targets retry against historic keys
from the PLC audit log (`plc_history_for_fragment`). `did:web` has no
history, so a failure there is a hard `SpecViolation`. Verification that
only succeeds against a historic key still passes the stage but emits an
`Advisory`.
- **Crypto stage skips local labelers with mismatched signing keys**: when
the labeler endpoint is local (per `is_local_labeler_hostname`) and at
least one label fails verification against the DID-document signing
key, `crypto::rollup` is `Skipped` rather than `SpecViolation`. The
rationale is that developers testing a local copy of a labeler will
typically not have the production signing key present, so a mismatch is
expected; PLC history fallback is also skipped in this case. If the
local labeler's key happens to match the published key, verification
proceeds normally (`Pass`).
- **Identity stage downgrades local endpoint mismatches to Advisory**:
when the user supplies `--target http://<local>:<port> --did <prod-did>`
and the DID document advertises a different (production) endpoint,
`identity::resolved_did_matches_flag` emits `Advisory` rather than
`SpecViolation`, and `IdentityFacts.labeler_endpoint` is overridden to
the local URL so HTTP / subscription / report stages all target the
local copy. Without this override the `block_facts = true` branch would
skip the report stage entirely, which is the opposite of what the
developer wanted. Remote-URL mismatches remain `SpecViolation`.
- **DRISL-CBOR canonicalization for label signing**: crypto stage
implements the deterministic CBOR canonicalization in
`canonicalize_label_for_signing` rather than pulling a library — label
signing uses a specific sort order and tag encoding that no existing
crate matches. See `crypto.rs` for the spec.
- **Every check always emits a result**: stages never short-circuit on the
first failure. When a prerequisite fails, downstream checks in the same
stage still emit `Skipped` rows with a reason. When a
whole stage is blocked upstream, the pipeline emits one
`<stage>::not_run` row per downstream stage with a reason string.
- **Identity facts gate downstream stages**: HTTP and crypto stages only
run when identity populated `IdentityFacts`. Subscription can run from
an explicit endpoint URL without identity (the
`LabelerTarget::Endpoint { did: None }` path).
- **Crypto pulls labels from HTTP and/or subscription**: the crypto stage
runs when identity succeeded and *either* the HTTP stage produced
`HttpFacts` *or* the subscription stage collected at least one
`sample_labels` entry. Labels from both sources are concatenated before
verification so a JSON-decoded `queryLabels` page and a CBOR-decoded
`subscribeLabels` frame are both exercised. Subscription samples are
capped at `subscription::SAMPLE_LABEL_CAP` to bound memory on noisy
streams.
- **Report stage runs last and always emits 10 rows**: the report stage
is ordered after crypto because it exercises write-side conformance
(authenticated `createReport`), not observational conformance. The
stage's output row count is a hard invariant — missing identity facts,
missing contract, or absent self-mint / PDS inputs all collapse to
`Skipped` rows rather than fewer rows. `Check::ORDER` is the frozen
iteration sequence.
- **`--commit-report` is the write-side opt-in**: without it the stage
still emits all 10 rows (mostly `Skipped`), but it will not POST
authenticated report bodies to the labeler. `ContractPublished` without
`--commit-report` is a stage-skip; with `--commit-report`, missing
`reasonTypes` / `subjectTypes` becomes a `SpecViolation` gating the
rest of the stage.
- **Self-mint only runs for locally-reachable labelers by default**:
`is_local_labeler_hostname` classifies the labeler endpoint; non-local
hosts skip all self-mint checks because the tool's local did:web doc
server can't be reached from a public labeler. `--force-self-mint`
overrides the heuristic. The `SelfMintSigner` (owner of the ephemeral
did:web HTTP server + signing key) is constructed pessimistically in
`LabelerCmd::run` so the stage can skip cheaply when locality fails.
- **PDS-mediated modes are credentials-gated**: the
`pds_service_auth_accepted` and `pds_proxied_accepted` checks require
`--handle` + `--app-password` (enforced as a symmetric clap `requires`).
The pipeline constructs a `RealPdsXrpcClient` only when credentials are
present; otherwise the checks emit `Skipped` with a reason.
- **PDS client targets the reporter's PDS, not the labeler's**:
`createSession` / `getServiceAuth` must be dispatched against the PDS
that actually hosts the reporter's account. The pipeline resolves
`--handle` (via `resolve_handle` → `resolve_did` → `find_service` for
`#atproto_pds`) and uses the resulting URL for `RealPdsXrpcClient`.
Using `IdentityFacts::pds_endpoint` (the labeler's PDS) would surface
as `InvalidToken: Token could not be verified` when reporter and
labeler live on different PDS shards. Resolution failures flatten to a
string and ride through `CreateReportRunOptions::pds_resolution_error`
so both PDS-mediated rows surface a `NetworkError` with a specific
message instead of a silent `Skipped`.
- **PDS-mediated failures split by origin, not by variant**: a single
`PdsServiceAuthRejected` / `PdsProxiedRejected` diagnostic carries a
`ResponseOrigin` field so labeler-side rejections classify as
`SpecViolation` while PDS-side rejections (`getServiceAuth` refused;
proxy rejected before forwarding) classify as `NetworkError`. Keeping
one variant per check preserves the one-diagnostic-per-check shape the
report stage documents, and the Mode-3 upstream-envelope heuristic
(`UpstreamError` / `UpstreamFailure` / 502 / 504) is what
discriminates the two origins on the PDS-proxied path.
- **Sentinel reason string and run-id**: every committed report body
carries a sentinel reason built by `create_report::sentinel::build`
that encodes the run-id (16 hex chars from `getrandom`) and an RFC 3339
UTC timestamp formatted by hand. This makes accidental reports easy to
filter out of moderation queues. The run-id is generated once in
`LabelerCmd::run` and threaded through `CreateReportRunOptions::run_id`.
## Invariants
- Every `CheckResult` with `status == SpecViolation` carries a `diagnostic`
with a non-empty `#[source_code]` — the report renderer uses miette's
`GraphicalReportHandler` and a missing source span degrades output.
- `LabelerReport::exit_code` returns `1` if any `SpecViolation` is
recorded, `2` if not but at least one `NetworkError` is recorded,
and `0` otherwise. Advisories and skipped checks never fail the run.
- The report stage always records exactly 10 `report::*` rows in
`Check::ORDER` order. Tests (`labeler_report::ac7_1_row_count`,
`labeler_report::ac7_2_canonical_order`) pin this. Any future check
addition/removal is a wire contract change.
- Snapshot tests under `tests/snapshots/` are part of the contract. Any
check ID, diagnostic code, or rendered line change must be accompanied
by a reviewed `cargo insta review`. Rendered `elapsed: Xms` lines are
normalized by `tests::common::normalize_timing` so per-run timing does
not churn snapshots.
- The pipeline never calls `reqwest::Client::new()` or constructs a
tokio-tungstenite connection outside of `Real*` seam structs.
## Key files
- `labeler.rs` — clap args, `LabelerCmd::run`, CLI bootstrap. New flags:
`--commit-report`, `--force-self-mint`, `--self-mint-curve`,
`--report-subject-did`, `--handle`, `--app-password` (the last two are
symmetrically `requires`-bound).
- `pipeline.rs` — `LabelerTarget`, `LabelerOptions` (now carries
`create_report_tee`, `commit_report`, `force_self_mint`,
`self_mint_curve`, `report_subject_override`, `self_mint_signer`,
`pds_credentials`, `pds_xrpc_client` / `pds_xrpc_client_override`,
`run_id`), `CreateReportTeeKind`, `PdsCredentials`, `parse_target`,
`run_pipeline` orchestration.
- `report.rs` — `CheckStatus`, `CheckResult`, `LabelerReport`,
`RenderConfig`, `Stage` (now 5 variants ending in `Report`), rendering
via `miette::GraphicalReportHandler`.
- `identity.rs` — identity stage: DID resolution, labeler record fetch
(through `atrium-api` types over the `HttpClient` seam), policy
validation. `IdentityFacts` now also carries
`reason_types` / `subject_types` / `subject_collections` extracted
from the labeler record so the report stage can check contract shape
without re-parsing.
- `http.rs` — HTTP stage: `RawHttpTee` trait, `RealHttpTee` reqwest
implementation, first-page / pagination / cursor checks against
`com.atproto.label.queryLabels`.
- `subscription.rs` — WebSocket stage: `WebSocketClient` / `FrameStream`
traits, CBOR frame decoder, two-connection backfill / live-tail logic.
- `crypto.rs` — label canonicalization, signature verification, PLC key
history fallback.
- `create_report.rs` — report stage entry point `create_report::run`,
`CreateReportTee` + `RealCreateReportTee` seam, `PdsXrpcClient` +
`RealPdsXrpcClient` seam, `PdsJwtFetcher` and `PdsProxiedPoster`
helpers, `Check` enum (10 variants with stable IDs and
`Check::ORDER`), diagnostic structs (`ContractMissing` + 9 per-check
accepted/rejected variants), `XrpcErrorEnvelope` + `RejectionShape`
classifier for 401 envelope checks, `build_minimal_report_body`.
- `create_report/sentinel.rs` — sentinel reason-string builder, RFC 3339
UTC formatter, `new_run_id` (16-hex-char run identifier via
`getrandom`).
- `create_report/did_doc_server.rs` — `DidDocServer`: an RAII
127.0.0.1:0-bound HTTP/1.1 server that serves a one-shot did:web
document so the labeler can resolve our self-mint identity.
- `create_report/self_mint.rs` — `SelfMintSigner` (owns signing key +
`DidDocServer`) and `SelfMintCurve` clap `ValueEnum` (`es256`,
`es256k`).
- `create_report/pollution.rs` — pollution-avoidance helpers
`choose_reason_type` and `choose_subject` so committing checks never
submit plausible moderation content.
## Gotchas
- `LabelerTarget::Endpoint { did: None }` runs HTTP and subscription but
skips identity, crypto, and report. Emitting those as "blocked" rather
than "skipped — no DID supplied" is a regression.
- `RawHttpTee::query_labels(cursor)` must NOT duplicate the first-page
request for reachability — the stage previously pinged before the real
request, doubling traffic against real servers.
- `CheckResult::diagnostic` is `Option<Box<dyn Diagnostic + Send + Sync>>`
— when you add a new failure case, wire the diagnostic through the
whole way. Snapshots will expose "diagnostic: None" as a rendered
blank block if you forget.
- The CLI pipes through `tracing` with `EnvFilter` — `--verbose` toggles
`DEBUG`. Per-stage instrumentation is load-bearing for the
`verbose_flag_accepted` CLI test.
- Fixture layout under `tests/fixtures/labeler/<stage>/<case>/` is
referenced by test helper `gen_fixtures` anchored to
`CARGO_MANIFEST_DIR`. Empty case directories need a `.gitkeep`.
- The report stage's `SelfMintSigner::spawn` binds a TCP port and starts
a tokio task; it's constructed only when needed (local endpoint or
`--force-self-mint`) to avoid orphaning a server on every run. Do not
move the allocation earlier in `LabelerCmd::run`.
- PDS-mediated modes create a `createSession` exactly once per run; the
resulting access JWT is reused across `getServiceAuth` and the
proxied POST. An `I2`-style regression (double `createSession`) is
visible through `FakePdsXrpcClient::request_count` in tests.
- `normalize_timing` in `tests/common/mod.rs` rewrites the `elapsed: Xms`
footer to a fixed token before snapshot comparison. Report-stage
integration tests depend on this — do not write report-stage snapshots
without going through it.