Skip to main content

Module retry

Module retry 

Source
Expand description

Connection retry, backoff, and timeouts (PRD §5.8.7, M18).

Three pieces:

  1. RetryPolicy — caller-tunable knobs: attempt count, base / factor / cap on exponential backoff, max wall-clock window, per-attempt connect timeout. Mirrors OpenSSH’s ConnectionAttempts + ConnectTimeout semantics with a Gitway-specific cap on total elapsed time.
  2. classify / Disposition — FR-82’s transient-vs-fatal error classifier. Network noise (ECONNREFUSED, ETIMEDOUT, EHOSTUNREACH, DNS NXDOMAIN) is Retry; everything else (auth failure, host-key mismatch, protocol error, signing error) is Fatal.
  3. run — the loop driver. Calls the supplied async op, sleeps with jittered exponential backoff between attempts, captures a RetryAttempt history for FR-83’s --test --json envelope, and emits a tracing::warn! event at crate::log::CAT_RETRY per failed attempt.

§Trust model

run is timeout-agnostic — its job is the loop + classifier + jitter + history. The per-attempt tokio::time::timeout wrap lives at the call site (currently session.rs::connect) so the same loop driver can be reused for non-network operations (agent reconnects, key-load retries) without forcing every caller to think about timeouts.

§Why russh-handshake failures are NOT retried

Once the TCP socket is up, any failure is either a fatal user-input error (auth rejected, host-key mismatch) or an in-flight protocol error mid-handshake. Re-driving an in-flight handshake is unsafe (the server may have already consumed our key-exchange contribution) and the failure modes are server-side — surfacing them clearly is more useful than silently retrying. classify returns Fatal for every russh::Error variant for this reason.

Structs§

RetryAttempt
One failed attempt’s record, captured during run for surfacing via crate::session::AnvilSession::retry_history and gitway --test --json’s data.retry_attempts envelope.
RetryPolicy
Caller-tunable retry knobs (PRD §5.8.7 FR-80, FR-81).

Enums§

Disposition
What run should do with an AnvilError from a single attempt.

Functions§

classify
Classifies an AnvilError as transient or fatal per FR-82.
run
Drives the retry loop for the supplied async operation.