1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
//! Retry-backoff math.
//!
//! Implements a stateless approximation of the AWS Architecture Blog's
//! "decorrelated jitter" pattern. The canonical sequence is:
//!
//! ```text
//! sleep_n = min(cap, random_between(base, sleep_{n-1} * 3))
//! ```
//!
//! which requires persisting the previous sleep. We don't persist that:
//! the queue table has no `last_backoff_ms` column, and adding one would
//! couple the backoff policy to the schema. Instead we bucket by attempt
//! count:
//!
//! ```text
//! sleep_n = random_between(base, min(cap, base * 3^n))
//! ```
//!
//! The distribution shape is the same (uniform over a window that grows
//! geometrically until it hits the cap), which is what decorrelated
//! jitter is for: spreading retries across workers so they don't
//! synchronise into a thundering-herd retry storm.
use Duration;
use Rng;
/// Minimum backoff. No attempt waits less than 1 second.
///
/// One second is the "you have time to notice and observe" floor.
/// Smaller values would let a tight-fail-loop job hammer the queue;
/// larger values would over-delay the first retry of a genuinely
/// transient failure.
pub const BASE_MS: u64 = 1_000;
/// Maximum backoff. No attempt waits more than 60 seconds.
///
/// 60 s is the "operator notices within a minute" ceiling. With
/// `max_attempts = 3` and the geometric growth, a job that exhausts its
/// attempts spans at most ~3 minutes of wall clock before lands in
/// `failed_permanent`.
pub const CAP_MS: u64 = 60_000;
/// Compute the next wait before retrying a job that just failed.
///
/// `attempt` is the just-completed attempt number (1-indexed). The
/// returned `Duration` is the time to wait before the next attempt is
/// eligible for dequeue (i.e., what we set as `run_at - now()` on the
/// row).
///
/// The return is drawn uniformly from `[BASE_MS,
/// min(CAP_MS, BASE_MS ยท 3^attempt)]`. For `attempt = 1` the window is
/// `[1s, 3s]`; for `attempt = 2` it's `[1s, 9s]`; for `attempt = 4` it's
/// already saturated at `[1s, 60s]`.
///
/// `attempt = 0` is treated as `attempt = 1` for safety; the queue's
/// guard guarantees we never call this with zero attempts in practice.