Skip to main content

Crate keen_retry

Crate keen_retry 

Source
Expand description

§keen-retry

§Introduction

keen-retry is a retry framework for cases where retries are part of your architecture, not just a helper around one call. It models retryability as a value that can be composed across library/application boundaries, and produces a diagnosable final outcome suitable for metrics, logging, and testing. It is designed for ownership-heavy, stream-oriented async code, allowing the preservation of input/state when resuming partially completed work matters.

§Features

  • Zero-Cost Abstraction: Leverages Rust’s powerful type system and compile-time optimizations to offer retry capabilities with no runtime overhead (see benches/zero_cost_abstractions.rs).
  • Clear Error Discrimination: Retrying operations that fail due to non-transient errors is futile, can waste resources and may ruin the application performance – model retryability as RetryResult::{Ok,Transient,Fatal} so callers don’t accidentally retry permanent failures. That’s also the foundation for resumability. Keep reading.
  • Instrumentation and Logging: Structured outputs + helpers to summarize retry errors + hook points to attach your logging/metrics, so you can fully inspect where your latency and resource utilization is going.
  • Composable across lib/app boundaries: library functions return retry-aware results; apps decide policy and can nest retries without losing context and avoiding unwanted “retry storms”.
  • Async/Await Support: First-class support for asynchronous programming, with async-friendly instrumentation hooks, backoff, and operation executors.
  • Flexible Backoff Strategies: In addition to the recommended “exponential backoff with jitter”, allows custom strategies to suit different usage scenarios.
  • Resume where you left off — without enforcing policies at the API: keen‑retry encourages your API to return tri‑state results (Ok | Transient | Fatal). In the Transient case, you return the resume state needed to continue; the application then decides if/when/how to retry via a fluent chain like .retry_with(...).<backoff>(). No sleeps or loops inside your library; policy stays in the app, while your types model retryability. See the Partial Completion via Continuation Closure pattern bellow.

§When to use this crate

Use keen-retry when retries are part of your architecture, not just a helper around a single call:

  • You want retryability to be explicit and composable: libraries return RetryResult (classification), applications attach backoff/limits/telemetry (policy), avoiding hidden “retry inside retry” behavior.
  • You are writing a library that exposes one high-level operation but internally performs many retryable steps (fan-out, pipelines, “sync” workflows). keen-retry lets the library classify failures and carry resumable state, while the application owns retry policy (budgets/backoff/telemetry) at the boundary, avoiding “hidden retries” and combinatorial retry growth.
    • This is also where the Partial Completion via Continuation Closure pattern fits: retries can resume from the remaining work instead of restarting from scratch. More details bellow.
  • You want the retry process as an output artifact: structured final outcomes + retry history make logs/metrics/testing straightforward.
  • You are writing stream-heavy async code where preserving owned inputs and composing retry logic in a pipeline matters.
  • You care about long-term maintainability of resiliency: keen-retry turns retryability and retry outcomes into explicit types and artifacts. This provides guardrails for new contributors (less chance of accidentally retrying fatal errors or dropping retry instrumentation) and makes resiliency behavior easier to review and test as the codebase evolves.

Prefer backoff / backon when you just need “retry this closure with exponential backoff” and don’t need retry history, payload propagation, or cross-layer composition.

§Tradeoffs

  • keen-retry exposes more states and patterns than most workloads need. For simple use-cases it can feel like overkill, and the zero-cost approach often relies on monomorphization/inlining (which may increase compile time and binary size).
  • Tradeoff: the model is richer than “retry a closure”, so new contributors need to learn RetryResult/ResolvedResult and the producer/consumer patterns. For small workloads, that added surface area may not pay off.

→ If you’ve ever had to add a second notify callback or a shared state struct just to get retry metrics, you’re in keen-retry territory.

§Quick Start

§Integrate your Library / API

The first step is to have every retryable operation from your Library or API returning the enriched RetryResult type, which clearly discriminates between Ok, Fatal and Transient variants:

/// Wrapper around [Self::connect_to_server_raw()], enabling `keen-retry` on it
pub async fn connect_to_server(&self) -> RetryProcedureResult<ConnectionErrors> {
  self.connect_to_server_raw().await
    .map_or_else(|error| match error.is_fatal() {
                   true  => RetryResult::Fatal     { input: (), error },
                   false => RetryResult::Transient { input: (), error },
                 },
                 |_| RetryResult::Ok { reported_input: (), output: () })
}

§Usage

Now, in the application, you may use it via the zero-cost functional API:

let resolved = connect_to_server()
    .retry_with(|_| connect_to_server())
    .<one-of-the-backoff-strategies>(...)
    .<instrumentation-facilities>(...)
    .<mapping-of-outputs-and-errors>(...);

§The keen-retry Diagram

keen-retry-diagram.png

For more details, please refer to tests/use_cases.rs, which contains advanced demonstrations such as how to add a fully fledged instrumentation (as seen in production applications), how to compose nested retry logics and how to implement the versatile “Partial Completion with Continuation Closure” design pattern.

§How keen-retry differs from mainstream retry crates

keen-retry targets a deeper problem than general-purpose retry crates like retry, backoff, or backon: it treats retries as values you can compose and observe, especially across library/application boundaries and in stream-heavy async code.

If all you need is “retry this closure with backoff”, those crates are often simpler. This section explains the design choices that make keen-retry different.

§Two-level result model (RetryResultResolvedResult)

keen-retry models retryability explicitly in the operation’s return value. Instead of returning only Result<T, E>, operations return a 3-way RetryResult: Ok, Transient, or Fatal.

This separates classification (“should we retry?”) from policy (“how and how long do we retry?”), which is especially useful when retryable operations are exposed by libraries and consumed by applications.

When the retry procedure runs, the result is upgraded into a ResolvedResult, which represents the final outcome and can carry retry history: Ok (succeeded immediately), Recovered (succeeded after transient failures), GivenUp (only transients until retry limit), Unrecoverable (transient failures followed by a fatal), or Fatal (fatal on first attempt).

Note: other crates also support retryable vs non-retryable outcomes (e.g. retry::OperationResult), but keen-retry bakes in payload propagation and a richer final “retry artifact” that is designed to be composed and inspected.

§Zero-cost when retries don’t happen (benchmarked)

keen-retry is designed so that adopting the “retry as a value” model does not impose overhead when retries are not exercised. In other words: when an operation succeeds immediately or fails fatally on the first attempt (i.e., no Transient path), the retry wrapper gets optimized down to the equivalent of a hand-written Result<T, E> flow.

The repository includes a Criterion benchmark (benches/zero_cost_abstractions.rs) that measures the related use cases – more details bellow.

§Ownership-friendly retries: zero-copy input propagation

Many retry helpers crates treat the operation as a closure returning Result<T, E>. That model works well when inputs are cheap to clone or can be recreated on demand, but it gets awkward in stream pipelines and consumer-style operations, where the input is owned and may be partially consumed before an error occurs.

keen-retry makes the ownership model explicit in RetryResult:

  • On success, you get Ok { reported_input, output } – where reported_input, if present, may be derived from the original input for instrumentation.
  • On failure, you get the original input back as Transient { input, error } or Fatal { input, error }.

This enables retry loops that can re-attempt work without requiring Clone on the input or side channels to stash state.

§Different observability story: Built-in diagnostics and observability hooks – diagnostics are part of the value

keen-retry treats observability as a first-class feature, not an afterthought. The ResolvedResult type includes facilities for generating succinct diagnostic reports — e.g., aggregating a list of transient errors encountered across all retry attempts into a compact occurrence-count map (via the built-in error mapping closure errors_to_occurrences_count or your custom one). This is designed for production use where you want rich logs (or rich error messages) without writing boilerplate instrumentation code every time.

For production logging and metrics, ResolvedResult also provides hook points like .inspect_recovered(), .inspect_given_up(), and .inspect_unrecoverable(), so instrumentation can be attached directly to the resolved outcome without external state.

To keep logs concise, the crate includes helpers:

  • loggable_retry_errors(&Vec<E>) -> String to serialize retry error occurrences compactly,
  • errors_to_occurrences_count(Vec<E>, E) -> HashMap<E, u16> to build an occurrence-count map when you own the error list.

→ You can also enrich the retry payload with context (e.g., a start timestamp) via .map_input(...). Please refer to tests/use_cases.rs.

§Composable, fluent pipeline (retry as a value)

keen-retry is designed so that library code can return a retry-aware value (RetryResult), and application code can later attach retry policy, backoff, and instrumentation in a fluent, composable way. This makes nested retries (library retries inside application retries) easier to express without losing context or inventing ad-hoc state passing.

In pseudo-code:

// One attempt produces a retry-aware value.
let attempt = connect_to_server().await; // -> RetryResult<...>

// The application upgrades it into a retry procedure with policy + observability.
let resolved = attempt
  .retry_with_async(|_| connect_to_server())
  .exponential_jitter_backoff(...)
  .inspect_recovered(...)
  .map_ok(...)
  .into_result_mapping_errors(your_metrics_aggregation);

Unlike “retry a closure until it returns Ok”, this style separates classification (in the library) from policy (in the app), and keeps the retry process inspectable.

§Advanced pattern: partial completion via continuation closures

Some retryable operations are not “all-or-nothing”. They may partially complete work and you may want subsequent attempts to resume from the remaining work rather than restart from scratch (e.g., broadcasting to multiple targets, where some succeeded and only a subset failed transiently). The tests/use_cases.rs file includes a worked example of this pattern under the name “Partial Completion with Continuation Closure”.

The key idea is that the “retry payload” can itself be a callable continuation: instead of re-applying the same input on every attempt, the continuation captures progress (such as the remaining targets) and, on each attempt, performs only the unfinished work before returning a new retry outcome. In the example, this is implemented by building a continuation closure that retains the list of pending sockets and re-runs only the transient failures on the next attempt.

How this compares to mainstream retry crates: you can implement “partial completion” with crates like retry, backoff, or backon, but it typically lives as an application-specific state machine hidden inside the retried closure. Those crates mostly model retries as “re-run the same closure/future until it returns Ok”, so keeping progress means maintaining mutable state across attempts (often via captured variables or shared state). For example, retry exposes OperationResult::{Ok, Retry, Err} and will keep calling the closure until it returns Ok or Err, which makes “resume where you left off” possible but entirely manual.

With backon, you can carry owned state explicitly via RetryableWithContext (useful to avoid async capture/lifetime issues), but you still implement the “remaining work” bookkeeping yourself inside that context, and the end result is still a plain Result rather than a first-class “retry artifact”.

In contrast, keen-retry makes the “continuation” itself a natural retry payload: each transient failure can return the updated continuation (representing remaining work), so the retry loop composes it without requiring ad-hoc external state plumbing. This tends to keep complex retry workflows (like partial completion) more explicit and more composable when you have nested retries across library/application layers.

§Async integration knobs (Tokio/futures/no-async)

Like many crates mentioned here, keen-retry supports both synchronous and async retry execution via dedicated executors (KeenRetryExecutor and KeenRetryAsyncExecutor).

Async support is also feature-gated: by default the crate enables async with the additional dependencies futures and tokio, and it also provides a no-async option for builds that want to avoid async runtime integration.

§How it compares, at a glance

Legend:

  • “Built-in” means modeled directly by the crate’s primary types/APIs.
  • “Manual” means possible, but you need to implement (and re-implement) it via captured state/context + notify/callbacks at each call site.
Featureretrybackoffbackonkeen-retry
Default meaning of “retryable”Retry means retry; Err means stopbackoff::Error::{Transient, Permanent}Retries when .when(err) is trueRetryResult::{Ok / Transient / Fatal}
3-way retry outcome typeYes (Ok / Retry / Err)Yes (Result<Ok, Backoff:Error<E>>)No (still Result<Ok, Err>)Yes (Ok / Transient / Fatal)
Where the retryability liesAt the type-level (OperationResult)At the err variant level (backoff::Error<E>)Call-site policy predicate (.when(...))At the type-level (RetryResult)
Final outcome taxonomy + retry history valueManualManualManualBuilt-in via ResolvedResult (e.g. Recovered/GivenUp/...)
Ownership-friendly payload propagationManual via captured stateManual via captured stateManual via .context(...)Built-in: original input carried in Transient/Fatal
Observability hooks + summariesManualPer-attempt notify (manual aggregation)Per-attempt notify (manual aggregation)Built-in “retry artifact” + hooks + summary helpers
Full async supportNoPartial: per-attempt notify is still syncPartial: per-attempt notify is still syncYes. The final “retry artifact” doesn’t use notify for metrics
Resumability / continuation workflowsManual state machine in closureManual state machine in closureManual state machine via contextPattern supported naturally as retry payload/continuation
Zero-cost non-retry path (benchmarked)Not claimedNot claimedNot claimedYes (see benches/zero_cost_abstractions.rs)
Companion guideDocs/examplesDocs/examplesDocs/examplesYes (book + patterns)

Compared versions (verified 2026-03-02): retry 2.2, backoff 0.4, backon 1.6, keen-retry 0.5.

§Performance Analysis

keen-retry has been rigorously benchmarked to ensure it adheres to the zero-cost abstraction principle, crucial in systems programming. Our benchmarks, available at benches/zero_cost_abstractions.rs, demonstrate the efficiency of the crate.

keen-retry-zero-cost-abstractions.png

§The Book

For a deep dive into the applicable Design Patterns, principles, strategies, and best practices for using keen-retry effectively, be sure to explore our companion keen-retry crate’s Book, which serves as a definitive guide, providing insights and practical examples to harness the full potential of keen-retry in various software development scenarios.

The keen-retry Book

§Maintenance Disclaimer

  • This crate has reached a stable API, allowing several patterns (as demonstrated in the companion book). – the concepts, models and core ideas are solid, and the API is likely not to change significantly.
  • Extensive tests prove both the implementation is correct and the intended use cases are addressed – so no bug-fix releases are expected.

Modules§

keen_retry_async_executor
Resting place for KeenRetryAsyncExecutor.
Keep this in sync with ../keen_retry_executor.rs
keen_retry_executor
Resting place for KeenRetryExecutor.\

Enums§

ExponentialJitter
Configuration options for the “Exponential with Random Jitter” backoff strategy
ResolvedResult
Contains all possibilities for finished retryable operations – conversible to Result<> – and some nice facilities for instrumentation (like building a succinct report of the retry errors).
This “Final Result” is a “Second Level” of result for an operation: it represents operations that where enabled to pass through the keen-retry retrying logic.
See also crate::RetryResult, for the “First Level” of results.
RetryResult
An extension over the original std Result<Ok, Err>, introducing a third kind: Transient failures – which are elligible for retry attempts: this may be considered the “First Level” of results, mapping directly from raw operation results.
Considering zero-copy, both Transient & Fatal variants will contain the original input payload, which is consumed by an Ok operation; The Ok operation, on the other hand, has the outcome result and may have an excerpt of the input, for instrumentation purposes.
See also crate::RetryResult, for the “Second Level” of results – after passing through some possible retry re-attempts.

Functions§

errors_to_occurrences_count
Consumes both retry_errors and fatal_error (from failed ResolvedResult and returns back a hashmap of error occurrence counts in the form:
exponential_jitter_from_exponent
Generates an iterator suitable for usage in backoff strategies for operations that recruit external / shared resources – such as network services. Its elements progress exponentially from the given initial_backoff_millis with the exponent ratio applied to each progression, up to re_attempts steps – each of which may be added / subtracted by jitter_ratio * backoff_millis``.\ As a special case, if the initial_backoff_millis` starts with 0, the first element in the geometric progression will be 0 and the rest of the progression will continue as if it had started with 1 – allowing for zero backoff on the first attempt, which might make sense in highly distributed systems with really low fault rates. See also exponential_jitter_from_range()
exponential_jitter_from_range
Generates an iterator suitable for usage in backoff strategies for operations that recruit external / shared resources – such as network services. Its elements progress exponentially from the given range_millis start range, going from the first to the last element in re_attempts steps – each of which may be added / subtracted by jitter_ratio * backoff_millis.
Notice that this method calculates the exponent from the given parameters.
As a special case, if the range – which is expressed in milliseconds – starts with 0, the first element in the geometric progression will be 0 and the rest of the progression will continue as if it had started with 1 – allowing for zero backoff on the first attempt, which might make sense in highly distributed systems with really low fault rates.
See also exponential_jitter_from_exponent()
loggable_retry_errors
Builds an as-short-as-possible list of retry_errors occurrences (out of order), provided ErrorType implements the Debug trait.

Type Aliases§

RetryConsumerResult
Suggar type for when an operation doesn’t produce outputs
RetryProcedureResult
Suggar type for when an operation doesn’t consume its inputs nor produce outputs
RetryProducerResult
Suggar type for when an operation doesn’t consume its inputs