durable_execution_sdk/lib.rs
1//! # AWS Durable Execution SDK for Lambda Rust Runtime
2//!
3//! This SDK enables Rust developers to build reliable, long-running workflows
4//! in AWS Lambda with automatic checkpointing, replay, and state management.
5//!
6//! ## Overview
7//!
8//! The AWS Durable Execution SDK provides a framework for building workflows that can
9//! survive Lambda function restarts, timeouts, and failures. It automatically checkpoints
10//! the state of your workflow, allowing it to resume exactly where it left off after
11//! any interruption.
12//!
13//! ### Key Features
14//!
15//! - **Automatic Checkpointing**: Every operation is automatically checkpointed, ensuring
16//! your workflow can resume from the last completed step.
17//! - **Replay Mechanism**: When a function resumes, completed operations return their
18//! checkpointed results instantly without re-execution.
19//! - **Concurrent Operations**: Process collections in parallel with configurable
20//! concurrency limits and failure tolerance.
21//! - **External Integration**: Wait for callbacks from external systems with configurable
22//! timeouts.
23//! - **Type Safety**: Full Rust type safety with generics, trait-based abstractions,
24//! and newtype wrappers for domain identifiers.
25//! - **Promise Combinators**: Coordinate multiple durable promises with `all`, `any`, `race`, and `all_settled`.
26//! - **Replay-Safe Helpers**: Generate deterministic UUIDs and timestamps that are safe for replay.
27//! - **Configurable Checkpointing**: Choose between eager, batched, or optimistic checkpointing modes.
28//! - **Trait Aliases**: Cleaner function signatures with [`DurableValue`] and [`StepFn`] trait aliases.
29//! - **Sealed Traits**: Internal traits are sealed to allow API evolution without breaking changes.
30//!
31//! ## Important Documentation
32//!
33//! Before writing durable workflows, please read:
34//!
35//! - [`docs::determinism`]: **Critical** - Understanding determinism requirements for replay-safe workflows
36//! - [`docs::limits`]: Execution limits and constraints you need to know
37//!
38//! ## Getting Started
39//!
40//! Add the SDK to your `Cargo.toml`:
41//!
42//! ```toml
43//! [dependencies]
44//! durable-execution-sdk = "0.1"
45//! tokio = { version = "1.0", features = ["full"] }
46//! serde = { version = "1.0", features = ["derive"] }
47//! ```
48//!
49//! ### Basic Workflow Example
50//!
51//! Here's a simple workflow that processes an order:
52//!
53//! ```rust,ignore
54//! use durable_execution_sdk::{durable_execution, DurableContext, DurableError, Duration};
55//! use serde::{Deserialize, Serialize};
56//!
57//! #[derive(Deserialize)]
58//! struct OrderEvent {
59//! order_id: String,
60//! amount: f64,
61//! }
62//!
63//! #[derive(Serialize)]
64//! struct OrderResult {
65//! status: String,
66//! order_id: String,
67//! }
68//!
69//! #[durable_execution]
70//! async fn process_order(event: OrderEvent, ctx: DurableContext) -> Result<OrderResult, DurableError> {
71//! // Step 1: Validate the order (checkpointed automatically)
72//! let is_valid: bool = ctx.step(|_step_ctx| {
73//! // Validation logic here
74//! Ok(true)
75//! }, None).await?;
76//!
77//! if !is_valid {
78//! return Err(DurableError::execution("Invalid order"));
79//! }
80//!
81//! // Step 2: Process payment (checkpointed automatically)
82//! let payment_id: String = ctx.step(|_step_ctx| {
83//! // Payment processing logic here
84//! Ok("pay_123".to_string())
85//! }, None).await?;
86//!
87//! // Step 3: Wait for payment confirmation (suspends Lambda, resumes later)
88//! ctx.wait(Duration::from_seconds(5), Some("payment_confirmation")).await?;
89//!
90//! // Step 4: Complete the order
91//! Ok(OrderResult {
92//! status: "completed".to_string(),
93//! order_id: event.order_id,
94//! })
95//! }
96//! ```
97//!
98//! ## Core Concepts
99//!
100//! ### DurableContext
101//!
102//! The [`DurableContext`] is the main interface for durable operations. It provides:
103//!
104//! - [`step`](DurableContext::step): Execute and checkpoint a unit of work
105//! - [`wait`](DurableContext::wait): Pause execution for a specified duration
106//! - [`create_callback`](DurableContext::create_callback): Wait for external systems to signal completion
107//! - [`invoke`](DurableContext::invoke): Call other durable Lambda functions
108//! - [`map`](DurableContext::map): Process collections in parallel
109//! - [`parallel`](DurableContext::parallel): Execute multiple operations concurrently
110//! - [`run_in_child_context`](DurableContext::run_in_child_context): Create isolated nested workflows
111//!
112//! ### Steps
113//!
114//! Steps are the fundamental unit of work in durable executions. Each step is
115//! automatically checkpointed, allowing the workflow to resume from the last
116//! completed step after interruptions.
117//!
118//! ```rust,ignore
119//! // Simple step
120//! let result: i32 = ctx.step(|_| Ok(42), None).await?;
121//!
122//! // Named step for better debugging
123//! let result: String = ctx.step_named("fetch_data", |_| {
124//! Ok("data".to_string())
125//! }, None).await?;
126//!
127//! // Step with custom configuration
128//! use durable_execution_sdk::{StepConfig, StepSemantics};
129//!
130//! let config = StepConfig {
131//! step_semantics: StepSemantics::AtMostOncePerRetry,
132//! ..Default::default()
133//! };
134//! let result: i32 = ctx.step(|_| Ok(42), Some(config)).await?;
135//! ```
136//!
137//! ### Step Semantics
138//!
139//! The SDK supports two execution semantics for steps:
140//!
141//! - **AtLeastOncePerRetry** (default): Checkpoint after execution. The step may
142//! execute multiple times if interrupted, but the result is always checkpointed.
143//! - **AtMostOncePerRetry**: Checkpoint before execution. Guarantees the step
144//! executes at most once per retry, useful for non-idempotent operations.
145//!
146//! ### Wait Operations
147//!
148//! Wait operations suspend the Lambda execution and resume after the specified
149//! duration. This is efficient because it doesn't block Lambda resources.
150//!
151//! ```rust,ignore
152//! use durable_execution_sdk::Duration;
153//!
154//! // Wait for 5 seconds
155//! ctx.wait(Duration::from_seconds(5), None).await?;
156//!
157//! // Wait for 1 hour with a name
158//! ctx.wait(Duration::from_hours(1), Some("wait_for_approval")).await?;
159//! ```
160//!
161//! ### Callbacks
162//!
163//! Callbacks allow external systems to signal your workflow. Create a callback,
164//! share the callback ID with an external system, and wait for the result.
165//!
166//! ```rust,ignore
167//! use durable_execution_sdk::CallbackConfig;
168//!
169//! // Create a callback with 24-hour timeout
170//! let callback = ctx.create_callback::<ApprovalResponse>(Some(CallbackConfig {
171//! timeout: Duration::from_hours(24),
172//! ..Default::default()
173//! })).await?;
174//!
175//! // Share callback.callback_id with external system
176//! notify_approver(&callback.callback_id).await?;
177//!
178//! // Wait for the callback result (suspends until callback is received)
179//! let approval = callback.result().await?;
180//! ```
181//!
182//! ### Parallel Processing
183//!
184//! Process collections in parallel with configurable concurrency and failure tolerance:
185//!
186//! ```rust,ignore
187//! use durable_execution_sdk::{MapConfig, CompletionConfig};
188//!
189//! // Process items with max 5 concurrent executions
190//! let results = ctx.map(
191//! vec![1, 2, 3, 4, 5],
192//! |child_ctx, item, index| async move {
193//! child_ctx.step(|_| Ok(item * 2), None).await
194//! },
195//! Some(MapConfig {
196//! max_concurrency: Some(5),
197//! completion_config: CompletionConfig::all_successful(),
198//! ..Default::default()
199//! }),
200//! ).await?;
201//!
202//! // Get all successful results
203//! let values = results.get_results()?;
204//! ```
205//!
206//! ### Parallel Branches
207//!
208//! Execute multiple independent operations concurrently:
209//!
210//! ```rust,ignore
211//! use durable_execution_sdk::ParallelConfig;
212//!
213//! let results = ctx.parallel(
214//! vec![
215//! |ctx| Box::pin(async move { ctx.step(|_| Ok("a"), None).await }),
216//! |ctx| Box::pin(async move { ctx.step(|_| Ok("b"), None).await }),
217//! |ctx| Box::pin(async move { ctx.step(|_| Ok("c"), None).await }),
218//! ],
219//! None,
220//! ).await?;
221//! ```
222//!
223//! ### Promise Combinators
224//!
225//! The SDK provides promise combinators for coordinating multiple durable operations:
226//!
227//! ```rust,ignore
228//! use durable_execution_sdk::DurableContext;
229//!
230//! async fn coordinate_operations(ctx: &DurableContext) -> Result<(), DurableError> {
231//! // Wait for ALL operations to complete successfully
232//! let results = ctx.all(vec![
233//! ctx.step(|_| Ok(1), None),
234//! ctx.step(|_| Ok(2), None),
235//! ctx.step(|_| Ok(3), None),
236//! ]).await?;
237//! // results = [1, 2, 3]
238//!
239//! // Wait for ALL operations to settle (success or failure)
240//! let batch_result = ctx.all_settled(vec![
241//! ctx.step(|_| Ok("success"), None),
242//! ctx.step(|_| Err("failure".into()), None),
243//! ]).await;
244//! // batch_result contains both success and failure outcomes
245//!
246//! // Return the FIRST operation to settle (success or failure)
247//! let first = ctx.race(vec![
248//! ctx.step(|_| Ok("fast"), None),
249//! ctx.step(|_| Ok("slow"), None),
250//! ]).await?;
251//!
252//! // Return the FIRST operation to succeed
253//! let first_success = ctx.any(vec![
254//! ctx.step(|_| Err("fail".into()), None),
255//! ctx.step(|_| Ok("success"), None),
256//! ]).await?;
257//!
258//! Ok(())
259//! }
260//! ```
261//!
262//! ### Accessing Original Input
263//!
264//! Access the original input that started the execution:
265//!
266//! ```rust,ignore
267//! use serde::Deserialize;
268//!
269//! #[derive(Deserialize)]
270//! struct OrderEvent {
271//! order_id: String,
272//! amount: f64,
273//! }
274//!
275//! async fn my_workflow(ctx: DurableContext) -> Result<(), DurableError> {
276//! // Get the original input that started this execution
277//! let event: OrderEvent = ctx.get_original_input()?;
278//! println!("Processing order: {}", event.order_id);
279//!
280//! // Or get the raw JSON string
281//! if let Some(raw_input) = ctx.get_original_input_raw() {
282//! println!("Raw input: {}", raw_input);
283//! }
284//!
285//! Ok(())
286//! }
287//! ```
288//!
289//! ### Replay-Safe Helpers
290//!
291//! Generate deterministic values that are safe for replay:
292//!
293//! ```rust
294//! use durable_execution_sdk::replay_safe::{
295//! uuid_from_operation, uuid_to_string, uuid_string_from_operation,
296//! };
297//!
298//! // Generate a deterministic UUID from an operation ID
299//! let operation_id = "my-operation-123";
300//! let uuid_bytes = uuid_from_operation(operation_id, 0);
301//! let uuid_string = uuid_to_string(&uuid_bytes);
302//!
303//! // Or use the convenience function
304//! let uuid = uuid_string_from_operation(operation_id, 0);
305//!
306//! // Same inputs always produce the same UUID
307//! let uuid2 = uuid_string_from_operation(operation_id, 0);
308//! assert_eq!(uuid, uuid2);
309//!
310//! // Different seeds produce different UUIDs
311//! let uuid3 = uuid_string_from_operation(operation_id, 1);
312//! assert_ne!(uuid, uuid3);
313//! ```
314//!
315//! For timestamps, use the execution start time instead of current time:
316//!
317//! ```rust,ignore
318//! use durable_execution_sdk::replay_safe::{
319//! timestamp_from_execution, timestamp_seconds_from_execution,
320//! };
321//!
322//! async fn my_workflow(ctx: DurableContext) -> Result<(), DurableError> {
323//! // Get replay-safe timestamp (milliseconds since epoch)
324//! if let Some(timestamp_ms) = timestamp_from_execution(ctx.state()) {
325//! println!("Execution started at: {} ms", timestamp_ms);
326//! }
327//!
328//! // Or get seconds since epoch
329//! if let Some(timestamp_secs) = timestamp_seconds_from_execution(ctx.state()) {
330//! println!("Execution started at: {} seconds", timestamp_secs);
331//! }
332//!
333//! Ok(())
334//! }
335//! ```
336//!
337//! **Important**: See [`docs::determinism`] for detailed guidance on writing replay-safe code.
338//!
339//! ### Wait Cancellation
340//!
341//! Cancel an active wait operation:
342//!
343//! ```rust,ignore
344//! async fn cancellable_workflow(ctx: DurableContext) -> Result<(), DurableError> {
345//! // Start a long wait in a child context
346//! let wait_op_id = ctx.next_operation_id();
347//!
348//! // In another branch, you can cancel the wait
349//! ctx.cancel_wait(&wait_op_id).await?;
350//!
351//! Ok(())
352//! }
353//! ```
354//!
355//! ### Extended Duration Support
356//!
357//! The Duration type supports extended time periods:
358//!
359//! ```rust
360//! use durable_execution_sdk::Duration;
361//!
362//! // Standard durations
363//! let seconds = Duration::from_seconds(30);
364//! let minutes = Duration::from_minutes(5);
365//! let hours = Duration::from_hours(2);
366//! let days = Duration::from_days(7);
367//!
368//! // Extended durations
369//! let weeks = Duration::from_weeks(2); // 14 days
370//! let months = Duration::from_months(3); // 90 days (30 days per month)
371//! let years = Duration::from_years(1); // 365 days
372//!
373//! assert_eq!(weeks.to_seconds(), 14 * 24 * 60 * 60);
374//! assert_eq!(months.to_seconds(), 90 * 24 * 60 * 60);
375//! assert_eq!(years.to_seconds(), 365 * 24 * 60 * 60);
376//! ```
377//!
378//! ## Type-Safe Identifiers (Newtypes)
379//!
380//! The SDK provides newtype wrappers for domain identifiers to prevent accidental
381//! mixing of different ID types at compile time. These types are available in the
382//! [`types`] module and re-exported at the crate root.
383//!
384//! ### Available Newtypes
385//!
386//! - [`OperationId`]: Unique identifier for an operation within a durable execution
387//! - [`ExecutionArn`]: Amazon Resource Name identifying a durable execution
388//! - [`CallbackId`]: Unique identifier for a callback operation
389//!
390//! ### Creating Newtypes
391//!
392//! ```rust
393//! use durable_execution_sdk::{OperationId, ExecutionArn, CallbackId};
394//!
395//! // From String or &str (no validation, for backward compatibility)
396//! let op_id = OperationId::from("op-123");
397//! let op_id2: OperationId = "op-456".into();
398//!
399//! // With validation (rejects empty strings)
400//! let op_id3 = OperationId::new("op-789").unwrap();
401//! assert!(OperationId::new("").is_err());
402//!
403//! // ExecutionArn validates ARN format
404//! let arn = ExecutionArn::new("arn:aws:lambda:us-east-1:123456789012:function:my-func:durable:abc123");
405//! assert!(arn.is_ok());
406//!
407//! // CallbackId for external system integration
408//! let callback_id = CallbackId::from("callback-xyz");
409//! ```
410//!
411//! ### Using Newtypes as Strings
412//!
413//! All newtypes implement `Deref<Target=str>` and `AsRef<str>` for convenient string access:
414//!
415//! ```rust
416//! use durable_execution_sdk::OperationId;
417//!
418//! let op_id = OperationId::from("op-123");
419//!
420//! // Use string methods directly via Deref
421//! assert!(op_id.starts_with("op-"));
422//! assert_eq!(op_id.len(), 6);
423//!
424//! // Use as &str via AsRef
425//! let s: &str = op_id.as_ref();
426//! assert_eq!(s, "op-123");
427//! ```
428//!
429//! ### Newtypes in Collections
430//!
431//! All newtypes implement `Hash` and `Eq`, making them suitable for use in `HashMap` and `HashSet`:
432//!
433//! ```rust
434//! use durable_execution_sdk::OperationId;
435//! use std::collections::HashMap;
436//!
437//! let mut results: HashMap<OperationId, String> = HashMap::new();
438//! results.insert(OperationId::from("op-1"), "success".to_string());
439//! results.insert(OperationId::from("op-2"), "pending".to_string());
440//!
441//! assert_eq!(results.get(&OperationId::from("op-1")), Some(&"success".to_string()));
442//! ```
443//!
444//! ### Serialization
445//!
446//! All newtypes use `#[serde(transparent)]` for seamless JSON serialization:
447//!
448//! ```rust
449//! use durable_execution_sdk::OperationId;
450//!
451//! let op_id = OperationId::from("op-123");
452//! let json = serde_json::to_string(&op_id).unwrap();
453//! assert_eq!(json, "\"op-123\""); // Serializes as plain string
454//!
455//! let restored: OperationId = serde_json::from_str(&json).unwrap();
456//! assert_eq!(restored, op_id);
457//! ```
458//!
459//! ## Trait Aliases
460//!
461//! The SDK provides trait aliases to simplify common trait bound combinations.
462//! These are available in the [`traits`] module and re-exported at the crate root.
463//!
464//! ### DurableValue
465//!
466//! [`DurableValue`] is a trait alias for types that can be durably stored and retrieved:
467//!
468//! ```rust
469//! use durable_execution_sdk::DurableValue;
470//! use serde::{Deserialize, Serialize};
471//!
472//! // DurableValue is equivalent to: Serialize + DeserializeOwned + Send
473//!
474//! // Any type implementing these traits automatically implements DurableValue
475//! #[derive(Debug, Clone, Serialize, Deserialize)]
476//! struct OrderResult {
477//! order_id: String,
478//! status: String,
479//! }
480//!
481//! // Use in generic functions for cleaner signatures
482//! fn process_result<T: DurableValue>(result: T) -> String {
483//! serde_json::to_string(&result).unwrap_or_default()
484//! }
485//! ```
486//!
487//! ### StepFn
488//!
489//! [`StepFn`] is a trait alias for step function closures:
490//!
491//! ```rust
492//! use durable_execution_sdk::{StepFn, DurableValue};
493//! use durable_execution_sdk::handlers::StepContext;
494//!
495//! // StepFn<T> is equivalent to:
496//! // FnOnce(StepContext) -> Result<T, Box<dyn Error + Send + Sync>> + Send
497//!
498//! // Use in generic functions
499//! fn execute_step<T: DurableValue, F: StepFn<T>>(func: F) {
500//! // func can be called with a StepContext
501//! }
502//!
503//! // Works with closures
504//! execute_step(|_ctx| Ok(42i32));
505//!
506//! // Works with named functions
507//! fn my_step(ctx: StepContext) -> Result<String, Box<dyn std::error::Error + Send + Sync>> {
508//! Ok(format!("Processed by {}", ctx.operation_id))
509//! }
510//! execute_step(my_step);
511//! ```
512//!
513//! ## Sealed Traits and Factory Functions
514//!
515//! Some SDK traits are "sealed" - they cannot be implemented outside this crate.
516//! This allows the SDK to evolve without breaking external code. Sealed traits include:
517//!
518//! - [`Logger`]: For structured logging in durable executions
519//! - [`SerDes`]: For custom serialization/deserialization
520//!
521//! ### Custom Loggers
522//!
523//! Instead of implementing `Logger` directly, use the factory functions:
524//!
525//! ```rust
526//! use durable_execution_sdk::{custom_logger, simple_custom_logger, LogInfo};
527//!
528//! // Simple custom logger with a single function for all levels
529//! let logger = simple_custom_logger(|level, msg, info| {
530//! println!("[{}] {}: {:?}", level, msg, info);
531//! });
532//!
533//! // Full custom logger with separate functions for each level
534//! let logger = custom_logger(
535//! |msg, info| println!("[DEBUG] {}", msg), // debug
536//! |msg, info| println!("[INFO] {}", msg), // info
537//! |msg, info| println!("[WARN] {}", msg), // warn
538//! |msg, info| println!("[ERROR] {}", msg), // error
539//! );
540//! ```
541//!
542//! ### Custom Serializers
543//!
544//! Instead of implementing `SerDes` directly, use the factory function:
545//!
546//! ```rust
547//! use durable_execution_sdk::serdes::{custom_serdes, SerDesContext, SerDesError};
548//!
549//! // Create a custom serializer for a specific type
550//! let serdes = custom_serdes::<String, _, _>(
551//! |value, _ctx| Ok(format!("custom:{}", value)), // serialize
552//! |data, _ctx| { // deserialize
553//! data.strip_prefix("custom:")
554//! .map(|s| s.to_string())
555//! .ok_or_else(|| SerDesError::deserialization("Invalid format"))
556//! },
557//! );
558//! ```
559//!
560//! ## Configuration Types
561//!
562//! The SDK provides type-safe configuration for all operations:
563//!
564//! - [`StepConfig`]: Configure retry strategy, execution semantics, and serialization
565//! - [`CallbackConfig`]: Configure timeout and heartbeat for callbacks
566//! - [`InvokeConfig`]: Configure timeout and serialization for function invocations
567//! - [`MapConfig`]: Configure concurrency, batching, and completion criteria for map operations
568//! - [`ParallelConfig`]: Configure concurrency and completion criteria for parallel operations
569//! - [`CompletionConfig`]: Define success/failure criteria for concurrent operations
570//!
571//! ### Completion Configuration
572//!
573//! Control when concurrent operations complete:
574//!
575//! ```rust
576//! use durable_execution_sdk::CompletionConfig;
577//!
578//! // Complete when first task succeeds
579//! let first = CompletionConfig::first_successful();
580//!
581//! // Wait for all tasks to complete (regardless of success/failure)
582//! let all = CompletionConfig::all_completed();
583//!
584//! // Require all tasks to succeed (zero failure tolerance)
585//! let strict = CompletionConfig::all_successful();
586//!
587//! // Custom: require at least 3 successes
588//! let custom = CompletionConfig::with_min_successful(3);
589//! ```
590//!
591//! ## Error Handling
592//!
593//! The SDK provides a comprehensive error hierarchy through [`DurableError`]:
594//!
595//! - **Execution**: Errors that return FAILED status without Lambda retry
596//! - **Invocation**: Errors that trigger Lambda retry
597//! - **Checkpoint**: Checkpoint failures (retriable or non-retriable)
598//! - **Callback**: Callback-specific failures
599//! - **NonDeterministic**: Replay mismatches (operation type changed between runs)
600//! - **Validation**: Invalid configuration or arguments
601//! - **SerDes**: Serialization/deserialization failures
602//! - **Suspend**: Signal to pause execution and return control to Lambda
603//!
604//! ```rust,ignore
605//! use durable_execution_sdk::DurableError;
606//!
607//! // Create specific error types
608//! let exec_error = DurableError::execution("Something went wrong");
609//! let validation_error = DurableError::validation("Invalid input");
610//!
611//! // Check error properties
612//! if error.is_retriable() {
613//! // Handle retriable error
614//! }
615//! ```
616//!
617//! ## Custom Serialization
618//!
619//! The SDK uses JSON serialization by default, but you can provide custom
620//! serializers by implementing the [`SerDes`] trait:
621//!
622//! ```rust,ignore
623//! use durable_execution_sdk::serdes::{SerDes, SerDesContext, SerDesError};
624//!
625//! struct MyCustomSerDes;
626//!
627//! impl SerDes<MyType> for MyCustomSerDes {
628//! fn serialize(&self, value: &MyType, context: &SerDesContext) -> Result<String, SerDesError> {
629//! // Custom serialization logic
630//! Ok(format!("{:?}", value))
631//! }
632//!
633//! fn deserialize(&self, data: &str, context: &SerDesContext) -> Result<MyType, SerDesError> {
634//! // Custom deserialization logic
635//! todo!()
636//! }
637//! }
638//! ```
639//!
640//! ## Logging and Tracing
641//!
642//! The SDK integrates with the `tracing` crate for structured logging. All operations
643//! automatically include execution context (ARN, operation ID, parent ID) in log messages.
644//!
645//! For detailed guidance on configuring tracing for Lambda, log correlation, and best practices,
646//! see the [TRACING.md](../docs/TRACING.md) documentation.
647//!
648//! ### Simplified Logging API
649//!
650//! The [`DurableContext`] provides convenience methods for logging with automatic context:
651//!
652//! ```rust,ignore
653//! use durable_execution_sdk::{DurableContext, DurableError};
654//!
655//! async fn my_workflow(ctx: DurableContext) -> Result<(), DurableError> {
656//! // Basic logging - context is automatically included
657//! ctx.log_info("Starting order processing");
658//! ctx.log_debug("Validating input parameters");
659//! ctx.log_warn("Retry attempt 2 of 5");
660//! ctx.log_error("Failed to process payment");
661//!
662//! // Logging with extra fields for filtering
663//! ctx.log_info_with("Processing order", &[
664//! ("order_id", "ORD-12345"),
665//! ("customer_id", "CUST-789"),
666//! ]);
667//!
668//! ctx.log_error_with("Payment failed", &[
669//! ("error_code", "INSUFFICIENT_FUNDS"),
670//! ("amount", "150.00"),
671//! ]);
672//!
673//! Ok(())
674//! }
675//! ```
676//!
677//! All logging methods automatically include:
678//! - `durable_execution_arn`: The execution ARN for correlation
679//! - `parent_id`: The parent operation ID (for nested operations)
680//! - `is_replay`: Whether the operation is being replayed
681//!
682//! ### Extra Fields in Log Output
683//!
684//! Extra fields passed to `log_*_with` methods are included in the tracing output
685//! as key-value pairs, making them queryable in log aggregation systems like CloudWatch:
686//!
687//! ```rust,ignore
688//! // This log message...
689//! ctx.log_info_with("Order event", &[
690//! ("event_type", "ORDER_CREATED"),
691//! ("order_id", "ORD-123"),
692//! ]);
693//!
694//! // ...produces JSON output like:
695//! // {
696//! // "message": "Order event",
697//! // "durable_execution_arn": "arn:aws:...",
698//! // "extra": "event_type=ORDER_CREATED, order_id=ORD-123",
699//! // ...
700//! // }
701//! ```
702//!
703//! ### Replay-Aware Logging
704//!
705//! The SDK supports replay-aware logging that can suppress or filter logs during replay.
706//! This is useful to reduce noise when replaying previously executed operations.
707//!
708//! ```rust
709//! use durable_execution_sdk::{TracingLogger, ReplayAwareLogger, ReplayLoggingConfig};
710//! use std::sync::Arc;
711//!
712//! // Suppress all logs during replay (default)
713//! let logger = ReplayAwareLogger::suppress_replay(Arc::new(TracingLogger));
714//!
715//! // Allow only errors during replay
716//! let logger_errors = ReplayAwareLogger::new(
717//! Arc::new(TracingLogger),
718//! ReplayLoggingConfig::ErrorsOnly,
719//! );
720//!
721//! // Allow all logs during replay
722//! let logger_all = ReplayAwareLogger::allow_all(Arc::new(TracingLogger));
723//! ```
724//!
725//! ### Custom Logger
726//!
727//! You can also provide a custom logger using the factory functions:
728//!
729//! ```rust
730//! use durable_execution_sdk::{custom_logger, simple_custom_logger, LogInfo};
731//!
732//! // Simple custom logger with a single function for all levels
733//! let logger = simple_custom_logger(|level, msg, info| {
734//! println!("[{}] {}: {:?}", level, msg, info);
735//! });
736//!
737//! // Full custom logger with separate functions for each level
738//! let logger = custom_logger(
739//! |msg, info| println!("[DEBUG] {}", msg), // debug
740//! |msg, info| println!("[INFO] {}", msg), // info
741//! |msg, info| println!("[WARN] {}", msg), // warn
742//! |msg, info| println!("[ERROR] {}", msg), // error
743//! );
744//! ```
745//!
746//! ## Duration Type
747//!
748//! The SDK provides a [`Duration`] type with convenient constructors:
749//!
750//! ```rust
751//! use durable_execution_sdk::Duration;
752//!
753//! let five_seconds = Duration::from_seconds(5);
754//! let two_minutes = Duration::from_minutes(2);
755//! let one_hour = Duration::from_hours(1);
756//! let one_day = Duration::from_days(1);
757//!
758//! assert_eq!(five_seconds.to_seconds(), 5);
759//! assert_eq!(two_minutes.to_seconds(), 120);
760//! assert_eq!(one_hour.to_seconds(), 3600);
761//! assert_eq!(one_day.to_seconds(), 86400);
762//! ```
763//!
764//! ## Thread Safety
765//!
766//! The SDK is designed for use in async Rust with Tokio. All core types are
767//! `Send + Sync` and can be safely shared across async tasks:
768//!
769//! - [`DurableContext`] uses `Arc` for shared state
770//! - [`ExecutionState`] uses `RwLock` and atomic operations for thread-safe access
771//! - Operation ID generation uses atomic counters
772//!
773//! ## Best Practices
774//!
775//! 1. **Keep steps small and focused**: Each step should do one thing well.
776//! This makes debugging easier and reduces the impact of failures.
777//!
778//! 2. **Use named operations**: Named steps and waits make logs and debugging
779//! much easier to understand.
780//!
781//! 3. **Handle errors appropriately**: Use `DurableError::execution` for errors
782//! that should fail the workflow, and `DurableError::invocation` for errors
783//! that should trigger a retry.
784//!
785//! 4. **Consider idempotency**: For operations that may be retried, ensure they
786//! are idempotent or use `AtMostOncePerRetry` semantics.
787//!
788//! 5. **Use appropriate concurrency limits**: When using `map` or `parallel`,
789//! set `max_concurrency` to avoid overwhelming downstream services.
790//!
791//! 6. **Set reasonable timeouts**: Always configure timeouts for callbacks and
792//! invocations to prevent workflows from hanging indefinitely.
793//!
794//! 7. **Ensure determinism**: Your workflow must execute the same sequence of
795//! operations on every run. Avoid using `HashMap` iteration, random numbers,
796//! or current time outside of steps. See [`docs::determinism`] for details.
797//!
798//! 8. **Use replay-safe helpers**: When you need UUIDs or timestamps, use the
799//! helpers in [`replay_safe`] to ensure consistent values across replays.
800//!
801//! 9. **Use type-safe identifiers**: Prefer [`OperationId`], [`ExecutionArn`], and
802//! [`CallbackId`] over raw strings to catch type mismatches at compile time.
803//!
804//! 10. **Use trait aliases**: Use [`DurableValue`] and [`StepFn`] in your generic
805//! functions for cleaner, more maintainable signatures.
806//!
807//! ## Result Type Aliases
808//!
809//! The SDK provides semantic result type aliases for cleaner function signatures:
810//!
811//! - [`DurableResult<T>`](DurableResult): Alias for `Result<T, DurableError>` - general durable operations
812//! - [`StepResult<T>`](StepResult): Alias for `Result<T, DurableError>` - step operation results
813//! - [`CheckpointResult<T>`](CheckpointResult): Alias for `Result<T, DurableError>` - checkpoint operation results
814//!
815//! ```rust,ignore
816//! use durable_execution_sdk::{DurableResult, StepResult, DurableError};
817//!
818//! // Use in function signatures for clarity
819//! fn process_order(order_id: &str) -> DurableResult<String> {
820//! Ok(format!("Processed: {}", order_id))
821//! }
822//!
823//! fn execute_step() -> StepResult<i32> {
824//! Ok(42)
825//! }
826//! ```
827//!
828//! ## Module Organization
829//!
830//! - [`client`]: Lambda service client for checkpoint operations
831//! - [`concurrency`]: Concurrent execution types (BatchResult, ConcurrentExecutor)
832//! - [`config`]: Configuration types for all operations
833//! - [`context`]: DurableContext, operation identifiers, and logging (includes factory functions
834//! [`custom_logger`] and [`simple_custom_logger`] for creating custom loggers)
835//! - [`docs`]: **Documentation modules** - determinism requirements and execution limits
836//! - [`docs::determinism`]: Understanding determinism for replay-safe workflows
837//! - [`docs::limits`]: Execution limits and constraints
838//! - [`duration`]: Duration type with convenient constructors
839//! - [`error`]: Error types, error handling, and result type aliases ([`DurableResult`], [`StepResult`], [`CheckpointResult`])
840//! - [`handlers`]: Operation handlers (step, wait, callback, etc.)
841//! - [`lambda`]: Lambda integration types (input/output)
842//! - [`operation`]: Operation types and status enums (optimized with `#[repr(u8)]` for compact memory layout)
843//! - [`replay_safe`]: Replay-safe helpers for deterministic UUIDs and timestamps
844//! - [`serdes`]: Serialization/deserialization system (includes [`custom_serdes`] factory function)
845//! - [`state`]: Execution state and checkpointing system
846//! - [`traits`]: Trait aliases for common bounds ([`DurableValue`], [`StepFn`])
847//! - [`types`]: Type-safe newtype wrappers for domain identifiers ([`OperationId`], [`ExecutionArn`], [`CallbackId`])
848
849pub mod client;
850pub mod concurrency;
851pub mod config;
852pub mod context;
853pub mod docs;
854pub mod duration;
855pub mod error;
856pub mod handlers;
857pub mod lambda;
858#[macro_use]
859pub mod macros;
860pub mod operation;
861pub mod replay_safe;
862pub mod retry_presets;
863pub mod runtime;
864pub mod serdes;
865pub mod state;
866pub mod structured_json_logger;
867pub mod summary_generators;
868pub mod termination;
869pub mod traits;
870pub mod types;
871
872// Private module for sealed trait pattern
873mod sealed;
874
875// Re-export main types at crate root
876pub use client::{
877 CheckpointResponse, DurableServiceClient, GetOperationsResponse, LambdaClientConfig,
878 LambdaDurableServiceClient, SharedDurableServiceClient,
879};
880pub use config::*;
881pub use context::{
882 custom_logger, generate_operation_id, simple_custom_logger, CustomLogger, DurableContext,
883 LogInfo, Logger, OperationIdGenerator, OperationIdentifier, ReplayAwareLogger,
884 ReplayLoggingConfig, TracingLogger, WaitForConditionConfig, WaitForConditionContext,
885};
886pub use duration::Duration;
887pub use error::{
888 AwsError, CheckpointResult, DurableError, DurableResult, ErrorObject, StepResult,
889 TerminationReason,
890};
891pub use lambda::{
892 DurableExecutionInvocationInput, DurableExecutionInvocationOutput, InitialExecutionState,
893 InvocationStatus,
894};
895pub use operation::{
896 CallbackDetails, CallbackOptions, ChainedInvokeDetails, ChainedInvokeOptions, ContextDetails,
897 ContextOptions, ExecutionDetails, Operation, OperationAction, OperationStatus, OperationType,
898 OperationUpdate, StepDetails, StepOptions, WaitDetails, WaitOptions,
899};
900pub use serdes::{custom_serdes, CustomSerDes, JsonSerDes, SerDes, SerDesContext, SerDesError};
901pub use state::{
902 create_checkpoint_queue, CheckpointBatcher, CheckpointBatcherConfig, CheckpointRequest,
903 CheckpointSender, CheckpointedResult, ExecutionState, ReplayStatus,
904};
905
906// Re-export newtype wrappers for domain identifiers
907pub use types::{CallbackId, ExecutionArn, OperationId, ValidationError};
908
909// Re-export trait aliases for common bounds
910pub use traits::{DurableValue, StepFn};
911
912// Re-export concurrency types
913pub use concurrency::{
914 BatchItem, BatchItemStatus, BatchResult, CompletionReason, ConcurrentExecutor,
915 ExecutionCounters,
916};
917
918// Re-export handlers
919pub use handlers::{
920 all_handler, all_settled_handler, any_handler, callback_handler, child_handler, invoke_handler,
921 map_handler, parallel_handler, race_handler, step_handler, wait_cancel_handler, wait_handler,
922 Callback, StepContext,
923};
924
925// Re-export replay-safe helpers
926pub use replay_safe::{
927 timestamp_from_execution, timestamp_seconds_from_execution, uuid_from_operation,
928 uuid_string_from_operation, uuid_to_string,
929};
930
931// Re-export runtime support
932pub use runtime::run_durable_handler;
933
934// Re-export termination manager
935pub use termination::TerminationManager;
936
937// Re-export structured JSON logger
938pub use structured_json_logger::{JsonLogContext, LogLevel, StructuredJsonLogger};
939
940// Re-export macro if enabled
941#[cfg(feature = "macros")]
942pub use durable_execution_sdk_macros::durable_execution;