1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
//! Query-embedding alignment for hybrid rule retrieval: decides HOW to embed
//! the recall query so the resulting vector can match the persisted index,
//! including the cold-start retry and the sustained-outage fast path.
use crate;
use crateindex_db;
use SqlitePool;
use Duration;
/// Embed the recall query in a profile the persisted index can actually match.
///
/// When the index's vector lane is dead relative to the active embedder
/// (`vector_lane_available == false` — e.g. a SHA1 index while the embedder is
/// cloud/BYOK), a remote query embed is futile: the vector lives in a different
/// space than every indexed chunk, so the cosine pass drops them all and recall
/// degrades to FTS-only — worse than the SHA1 hash + FTS hybrid — while paying a
/// doomed round-trip. So embed locally with SHA1 instead, which matches the SHA1
/// index dimension, makes no network call, and stays quiet. The same applies
/// when there is no usable lane at all (a not-yet-built or empty index).
///
/// This keys off the STATIC lane check only, deliberately. A cloud-profiled
/// index whose provider is briefly down still issues a remote embed that times
/// out and falls back to SHA1 per-call: a bounded per-query cost, with immediate
/// recovery to semantic ranking the moment the provider returns.
///
/// `cold_start_retry`: the first query after an idle period hits a cold cloud
/// connection that routinely exceeds the ~2.5s base budget, so recall falls back
/// to SHA1 even on a healthy lane. The interactive `recall`/`search` path (gated
/// on a healthy lane) raises the first attempt's ceiling to
/// [`COLD_RETRY_EMBEDDING_TIMEOUT`] so a cold round-trip completes in one shot
/// (a warm query is unaffected — the budget is a ceiling), then retries once to
/// ride out a transient transport flap before settling for SHA1.
///
/// `cold_start_retry` is only for human-waiting CLI paths. Latency-critical
/// MCP/hook callers bypass this function with local query embeddings, so a cold
/// provider never sits in the agent's tool-call path.
/// If a caller reaches this function while the current project index is already
/// in that local-agent profile, keep the query local as well.
pub async
/// Per-query budget for the single cold-start retry in
/// [`embed_query_aligned_to_index`]. Sized to clear a genuinely cold single-text
/// OpenAI embed in one shot while staying under the 45s embedding HTTP-client
/// cap. The outage gate (`cloud_embed_outage_active`) trips after a short run of
/// failures and routes subsequent queries to SHA1, so this larger budget is only
/// ever paid on the first cold call.
pub const COLD_RETRY_EMBEDDING_TIMEOUT: Duration = from_secs;