1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
//! Shared upsert-mapping logic for HNSW index variants.
//!
//! Both `HnswIndex` and `NativeHnswIndex` use identical mapping upsert
//! semantics. This module provides a single implementation to avoid
//! duplication.
use ShardedMappings;
use ShardedVectors;
use cratevalidate_dimension_match;
/// Result of an upsert mapping operation, carrying rollback information.
///
/// On success the caller uses `idx` as the internal HNSW index for the new
/// graph node. On graph-insert failure, the caller passes this struct to
/// [`rollback_upsert`] to restore the previous state.
pub
/// Registers an ID with upsert semantics and cleans up stale vector data.
///
/// If the ID already existed, the old mapping is replaced and the stale
/// sidecar vector (if stored) is removed from `ShardedVectors`.
///
/// Returns an [`UpsertResult`] containing the new index and optional old
/// index for rollback purposes.
pub
/// Batch version of `upsert_mapping` with fast-path for new IDs.
///
/// Uses `register_or_replace_batch` which skips the expensive `entry()`
/// path for IDs that don't exist yet (common in pure-insert workloads).
///
/// # Phase Ordering
///
/// Callers must validate vector dimensions **before** calling this function.
/// Once mapping registration begins, the mutations cannot be cheaply undone
/// without explicit rollback. See `prepare_batch_insert()` in `batch.rs`
/// for the canonical call sequence.
pub
/// Validates dimensions for every vector in `items`, then registers the IDs
/// via [`upsert_mapping_batch`].
///
/// # Phase Ordering Invariant
///
/// Dimension validation runs to completion **before** any call to
/// `upsert_mapping_batch` — a mismatch discovered after partial upsert
/// would leave orphaned mappings. See `docs/SOUNDNESS.md` "HNSW Batch
/// Insertion Ordering".
///
/// Shared by `HnswIndex::prepare_batch_insert` and
/// `NativeHnswIndex::insert_batch` (#448 Group D). Generic over the vector
/// slice lifetime so callers can pass either `&Vec<f32>` or `&[f32]`.
///
/// # Errors
///
/// Returns [`crate::error::Error::DimensionMismatch`] on the first vector
/// whose dimension differs from `expected_dimension`.
pub
/// Soft-deletes a single ID: removes it from mappings and, when vector
/// storage is enabled, removes the corresponding sidecar slot.
///
/// Returns `true` if the ID existed and was removed, `false` if it was
/// already absent. The HNSW graph node itself is left in place — it becomes
/// a tombstone that is filtered out during search via the reverse mapping.
///
/// Shared by `HnswIndex::remove` and `NativeHnswIndex::remove` (identical
/// bodies, #448 Group F consolidation).
pub
/// Reconciles pre-registered mapping indices with graph-assigned node IDs.
///
/// `upsert_mapping_batch` allocates internal indices optimistically (one per
/// item) but `parallel_insert` may assign different node IDs when the HNSW
/// graph recycles slots or the rayon ordering diverges. This helper brings
/// the mappings back in sync with whatever the graph decided:
///
/// * If the graph-assigned ID equals the pre-registered index, nothing to do.
/// * Otherwise, remove the stale reverse mapping (`result.idx -> ext_id`) and
/// restore the authoritative one (`ext_id <-> assigned_id`).
///
/// Returns the list of authoritative storage indices, in input order, so the
/// caller can store sidecar vectors at the correct slot.
///
/// Both `HnswIndex::insert_batch_parallel` and `NativeHnswIndex::insert_batch`
/// used to duplicate this logic — consolidated here for #448 Group D.
pub
/// Rolls back every upsert in `rollback_info`, in reverse order, after a
/// batched graph insertion fails.
///
/// Reverse order is mandatory for duplicate-ID chains (a later upsert inside
/// the same batch overwrites an earlier one; restoring forward would undo the
/// later state before the earlier rollback depends on it).
///
/// Both `HnswIndex::insert_batch_parallel` and `NativeHnswIndex::insert_batch`
/// used to duplicate this loop — consolidated here for #448 Group D.
pub
/// Rolls back mapping state after a failed graph insertion.
///
/// Removes the newly-allocated mapping and, if this was an update,
/// restores the previous mapping so the point remains searchable
/// through its old graph node.
///
/// **Transient gap**: Between `remove` and `restore`, the ID has no
/// mapping for a brief window (nanoseconds). A concurrent search during
/// this window will not find the point. This only occurs on graph-insert
/// failure, which is rare (allocation error).
///
/// **Sidecar loss**: The old sidecar vector (in `ShardedVectors`) was
/// already removed by [`upsert_mapping`] and cannot be cheaply restored.
/// The HNSW graph still holds the vector data in `ContiguousVectors` for
/// traversal, so the point remains searchable -- only sidecar reranking
/// precision is lost for the affected point until the next successful
/// upsert.
pub