1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
//! `get_or_load_index` — hot-path resolver with lazy cold-index loading (#993).
//!
//! Why: all per-index HTTP handlers need to resolve an `IndexHandle`. With
//! selective warm-boot, the handle may not be in the hot `IndexRegistry` yet.
//! This module implements the full load-on-demand flow without exposing the
//! double-checked-lock details to callers.
//! What: one async function (`get_or_load_index`) and one error type
//! (`LazyLoadError`). Generic over the restore function so tests inject fakes.
//!
//! PR #1103 TOCTOU fix: between the `entries.get(id)` cold-check (step 2) and
//! `loading_gate(id)` (step 3), a concurrent `mark_loaded` can remove the entry
//! from the cold store so `loading_gate` returns `None`. The previous code
//! returned `LazyLoadError::NotFound` in that case — a spurious 404 for an
//! index that just became hot. The fix: when `loading_gate` returns `None`,
//! re-check the hot registry; if the index is now there, return it (the
//! concurrent load raced us and won).
//!
//! Test: `get_or_load_index_*` in the parent module's `tests` block;
//! `get_or_load_index_gate_none_but_index_just_became_hot` for the race path.
use Arc;
use Duration;
use crate;
use cratePersistedIndex;
use ColdIndexStore;
/// Look up an index from the hot registry, loading it lazily if it is cold.
///
/// Why (issue #993): all per-index HTTP handlers need to resolve a handle.
/// With lazy warm-boot, the handle may not be in the hot registry yet. This
/// helper implements the full load-on-demand flow: (1) hot fast-path via
/// `registry.get(id)`; (2) cold check — `NotFound` if absent from both stores;
/// (3) acquire per-index loading gate; if gate returns `None` (concurrent
/// `mark_loaded` raced us), re-check hot registry and return it or `NotFound`;
/// (4a) re-check hot registry after gate acquired; (4b) re-check `is_failed`
/// after gate acquired — if a concurrent thread just called `mark_failed(id)`,
/// short-circuit with `RestoreFailed` instead of calling `restore_fn` a second
/// time for the same first-failure event (TOCTOU fix, issue #1125); (5) load
/// via `restore_fn(entry)` inside `tokio::time::timeout`; (6) `mark_loaded(id)`;
/// (7) return `Err(LazyLoadError::Loading)` on timeout for `503 index_loading`.
///
/// Issue #1106: when `restore_fn` returns `false` (blocked volume, missing
/// root_path), call `cold_store.mark_failed(id)` to evict the entry from
/// `entries` (so `indexes_lazy` decreases and `contains()` returns `false`)
/// and return `LazyLoadError::RestoreFailed` instead of re-returning `Loading`.
/// Subsequent calls for the same id go through the `cold_store.contains()`
/// guard in the search handler, which returns `false` for failed entries,
/// causing the handler to return 404 (which is acceptable — the index exists
/// in the registry sense but cannot be served). Callers that need to
/// distinguish "truly unknown" from "restore failed" should additionally check
/// `cold_store.is_failed(id)`.
///
/// What: generic over the restore function so tests can inject a fake restore.
///
/// Test: `get_or_load_index_hot_path`, `get_or_load_index_loads_cold_index`,
/// `get_or_load_index_returns_loading_on_timeout`,
/// `get_or_load_index_gate_none_but_index_just_became_hot`,
/// `get_or_load_index_restore_false_marks_failed`,
/// `get_or_load_index_gate_recheck_is_failed_short_circuits`.
pub async
/// Error returned by [`get_or_load_index`].
///
/// Why: callers need to distinguish a genuine 404 (unknown id) from a
/// transient 503 (cold index still loading / timed out) and a permanent 503
/// (cold index restore failed — issue #1106).
/// What: three variants — `NotFound` (emit 404), `Loading` (emit 503 with
/// `retry_after_secs` — transient), and `RestoreFailed` (emit 503 — permanent,
/// `restore_fn` returned `false`; operator must restart or re-register).
/// Test: variant-level assertions in `get_or_load_index_*` tests.