durable-lambda-core 1.2.0

Core replay engine, types, and operation logic for AWS Lambda durable execution in Rust
Documentation

durable-lambda-core

Core replay engine, types, and operation logic for AWS Lambda durable execution in Rust.

Docs.rs Crates.io License: MIT OR Apache-2.0

Overview

durable-lambda-core is the foundational crate of the durable-rust SDK. It contains the replay engine, all 8 durable operations, type definitions, error types, and the backend abstraction layer.

Most users should not depend on this crate directly. Instead, choose one of the four ergonomic wrapper crates that provide a higher-level API:

Crate Style Best for
durable-lambda-closure Closure-native (recommended) Simplest syntax, no traits or macros
durable-lambda-macro Proc-macro Zero boilerplate with #[durable_execution]
durable-lambda-trait Trait-based OOP pattern, shared state via struct fields
durable-lambda-builder Builder-pattern Most configurable, tracing/error hooks

Use durable-lambda-core directly when you need access to internal types like DurableContext, DurableBackend, or the replay engine for custom integrations.

Features

  • Replay engine with deterministic state machine (Replaying -> Executing) that replays completed operations from cache and executes new ones
  • 8 core durable operations: Step, Wait, Callback, Invoke, Parallel, Map, Child Context, and Logging
  • Step options: configurable retries, exponential backoff, per-step timeouts, and conditional retry predicates
  • Batch checkpoint mode to reduce checkpoint API calls by up to 90% for sequential step workflows
  • Saga / compensation support with step_with_compensation() for durable rollback
  • Deterministic operation IDs using blake2b hashing, byte-for-byte compatible with the Python SDK
  • DurableBackend trait abstracting all AWS API calls behind a single boundary (RealBackend for production, MockBackend for testing)
  • Full Python SDK compatibility -- identical checkpoint protocol, operation IDs, and replay semantics

Getting Started

Add to your Cargo.toml:

[dependencies]
durable-lambda-core = "0.1"
tokio = { version = "1", features = ["full"] }
serde_json = "1"
lambda_runtime = "1.1"

Direct Usage with DurableContext

use durable_lambda_core::context::DurableContext;
use durable_lambda_core::error::DurableError;

async fn handler(
    event: serde_json::Value,
    mut ctx: DurableContext,
) -> Result<serde_json::Value, DurableError> {
    // Step: checkpointed work unit
    let order: Result<serde_json::Value, String> = ctx.step("validate", || async {
        Ok(serde_json::json!({"order_id": 42, "valid": true}))
    }).await?;

    // Wait: time-based suspension
    ctx.wait("cooldown", 10).await?;

    // Step with retries and backoff
    let payment: Result<String, String> = ctx.step_with_options(
        "charge",
        durable_lambda_core::types::StepOptions::new()
            .retries(3)
            .backoff_seconds(5),
        || async { Ok("tx-abc-123".to_string()) },
    ).await?;

    Ok(serde_json::json!({
        "order": order.unwrap(),
        "transaction": payment.unwrap(),
    }))
}

Operations

Step (checkpointed work)

The fundamental operation. Wraps a closure in a checkpoint -- on first execution the closure runs and the result is persisted; on replay the cached result is returned without executing the closure.

let result: Result<String, String> = ctx.step("validate", || async {
    Ok("valid".to_string())
}).await?;

Step with Options

Configure retries, backoff, timeouts, and conditional retry predicates:

use durable_lambda_core::types::StepOptions;

let result: Result<i32, String> = ctx.step_with_options(
    "charge",
    StepOptions::new()
        .retries(3)
        .backoff_seconds(5)
        .timeout_seconds(30),
    || async { Ok(100) },
).await?;

Wait (time-based suspension)

Suspends execution for a specified number of seconds. The wait is checkpointed -- on replay it completes immediately.

ctx.wait("cooldown", 30).await?;

Callback (external signal coordination)

Creates a callback handle and suspends until an external system signals completion.

Invoke (Lambda-to-Lambda)

Durably invokes another Lambda function with automatic checkpointing of the result.

Parallel (concurrent fan-out)

Executes multiple branches concurrently, each with its own child DurableContext.

Map (parallel collection processing)

Processes a collection in parallel with configurable batch sizes.

Child Context (isolated subflow)

Runs an isolated subflow with its own checkpoint namespace.

Replay-Safe Logging

All log methods are no-ops during replay, preventing duplicate log entries:

ctx.log("processing order");
ctx.log_with_data("order details", &serde_json::json!({"id": 42}));

Replay Engine

The replay engine is the heart of durable execution. It maintains a HashMap<String, Operation> keyed by deterministic operation IDs (64 hex-character blake2b hashes).

State machine:

  1. Replaying -- When the context is created with completed operations from a previous invocation, the engine starts in Replaying mode. Each step() call looks up its operation ID in the map and returns the cached result.
  2. Executing -- Once all completed operations have been visited, the engine transitions to Executing mode. Subsequent step() calls execute the closure and checkpoint the result.

Operation ID generation:

  • Root operations: blake2b("{counter}")
  • Child operations: blake2b("{parent_id}-{counter}")

These IDs are byte-for-byte compatible with the Python Durable Lambda SDK, ensuring cross-language replay compatibility.

DurableBackend Trait

The DurableBackend trait is the sole I/O boundary in the SDK. All AWS API calls flow through this trait:

  • RealBackend -- Production implementation that calls the AWS Lambda durable execution APIs
  • MockBackend -- Testing implementation that records calls without making network requests

To implement a custom backend (e.g., for a different cloud provider), implement the DurableBackend trait.

API Reference

Key types and modules:

Type Description
DurableContext Main context type with all 8 operations
DurableError SDK infrastructure error type
StepOptions Step configuration (retries, backoff, timeout, retry_if)
ParallelOptions Parallel execution configuration
MapOptions Map operation configuration (batch_size)
CallbackOptions Callback configuration (timeout, heartbeat)
DurableBackend Trait abstracting AWS API calls
ExecutionMode Replaying or Executing state

Full API documentation: docs.rs/durable-lambda-core

License

Licensed under either of MIT or Apache-2.0 at your option.

Repository

https://github.com/pgdad/durable-rust