1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
//! Shared helpers: root-path validation, chunk containment check, and
//! embedder status response builders.
//!
//! Why: These small helpers are called from multiple handler modules
//! (`indexes.rs`, `search.rs`, `reindex_handlers.rs`); centralising them
//! avoids duplication and keeps the 500-line cap on each handler file.
//! What: `validate_root_path`, `file_is_within_root`,
//! `embedder_initializing_response`, `embedder_error_response`.
//! Test: `file_is_within_root_*`, `create_index_canonicalizes_*`, and
//! `validate_root_path_denylist_*` tests.
use ;
/// Validate `path` as a safe, canonical root for indexing.
///
/// Why: Defense-in-depth for the daemon — even when the CLI-side check
/// (`commands/index.rs`) is bypassed (direct HTTP calls, MCP tools, scripts),
/// the daemon must refuse sensitive roots. The hard denylist in
/// `crate::allowlist::is_denied` is the authoritative gate; this function
/// applies it **after** canonicalization so symlink tricks or `..` traversals
/// cannot bypass the check.
///
/// Issue #829 (blocking canonicalize): the previous sync version called
/// `std::fs::canonicalize` and `path.is_dir()` directly on the tokio async
/// thread. Both are blocking syscalls that park the executor thread for the
/// duration of the kernel operation. Under load (many concurrent `POST /indexes`
/// requests or a network-backed filesystem that is slow to respond) this starves
/// the runtime. The fix: this function is now `async` and uses
/// `tokio::fs::canonicalize` (non-blocking, runs on the blocking pool) and
/// `tokio::fs::metadata` for the directory check.
///
/// What: in order — (1) rejects empty/non-absolute paths (no I/O); (2)
/// checks `is_dir` via `tokio::fs::metadata`; (3) canonicalizes via
/// `tokio::fs::canonicalize`; (4) calls `crate::allowlist::is_denied` on the
/// canonical path and returns 400 with the denial reason when matched.
/// Test: `validate_root_path_denylist_rejects_ssh`, `_rejects_home`,
/// `_rejects_tmp`, `_accepts_project_dir` in `tests_denylist.rs`.
pub async
/// Determine whether a chunk's stored `file` field falls within an index's
/// registered root.
///
/// Why: issue #64 — even with `validate_root_path` (#63) preventing future
/// misregistrations, a daemon that previously indexed under the wrong root
/// can have persisted chunks whose `file` paths point at a different
/// project. The search handler post-filters with this predicate so cross-
/// index bleed cannot leak through to clients.
/// Why (issue #541 update): the warm-boot canonicalization in `restore_one_index`
/// prevents the stale-root problem going forward; this predicate adds a
/// canonicalize fallback for absolute paths so that any residual mismatch
/// (e.g. chunks indexed before the fix, volume mount alias, macOS /private/var
/// ↔ /var) also never causes a valid result to be dropped.
/// What: returns `true` when `file` is either (a) a clean relative path
/// (no leading `/`, no `..` segments) — the normal case, since the reindex
/// walker stores chunk paths relative to the index root — or (b) an
/// absolute path that starts with `root` (cheap lexical check). If (b) fails
/// and the file path exists on disk, falls back to a canonicalized comparison
/// so symlink aliases never cause a false drop (approach (b) from issue #541
/// — only results that fail the cheap check pay the `canonicalize` syscall
/// cost). Everything else (relative path with `..`, absolute path pointing
/// genuinely elsewhere) returns `false`.
/// Test: `file_is_within_root_*` unit tests below; `file_is_within_root_symlinked_root`
/// covers the symlink-alias case added for #541.
pub
/// Build a `503 Service Unavailable` response for handlers that require the
/// embedder before the background init task has finished.
///
/// Why: callers (CLI, MCP, integrators) need to distinguish "transient — try
/// again in a few seconds" from real failures. A standard 503 with a typed
/// JSON body lets `trusty-search index` retry, while exposing a clear
/// `embedder initializing` reason for human operators reading logs.
/// What: returns `(503, {"error": "embedder initializing, retry in a few seconds"})`.
/// Test: hit `POST /indexes` immediately after daemon boot; assert 503 and
/// JSON body shape.
pub
/// Build a `503 Service Unavailable` response when the embedder background
/// init task has recorded a permanent failure (issue #121).
///
/// Why: previously a hung/failed init left the daemon stuck in
/// `"initializing"` forever, so retry loops in `trusty-search index` and
/// downstream clients spun indefinitely. Returning a typed error body with
/// the recorded message lets callers fail fast and surfaces the root cause
/// (e.g. "init timed out after 60s") in logs and CLI output.
/// What: returns `(503, {"error": "embedder init failed: <message>"})`.
/// Test: `create_index_returns_503_with_error_when_embedder_failed`.
pub