1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
//! Agent-facing retrieval: compose structured filters, dense vector
//! similarity, and learned-sparse retrieval under a token budget.
//!
//! The indexes in [`crate::index`] each answer one question in
//! isolation: "which nodes carry label X", "which are semantically
//! close to this embedding", "which fire on this sparse query". Real
//! agents need all three at once and cannot afford to overflow their
//! LLM context window.
//!
//! [`Retriever`] is the composition layer. It:
//!
//! 1. Collects candidate node IDs from each ranker (vector, sparse).
//! 2. Fuses ranked lists with Reciprocal Rank Fusion (RRF, k=60).
//! 3. Gates fused candidates through label / property filters.
//! 4. Renders each surviving node to a compact text form.
//! 5. Greedily packs results in RRF-rank order until the caller's
//! token budget is exhausted (rank-order skip: if a node does not
//! fit, move on; never reorder to exploit slack).
//!
//! The return value ([`RetrievalResult`]) carries both the packed
//! items and cost metadata (`tokens_used`, `dropped`, `candidates_seen`)
//! so callers can detect "the budget was tight and we left good stuff
//! out" without a second round-trip.
//!
//! # Determinism
//!
//! All upstream rankers return hits in `(score desc, node_id asc)`
//! order, RRF is a pure function of ranks, and rendering is a pure
//! function of the node. Two independent processes with the same repo
//! head and the same [`Retriever`] configuration produce byte-
//! identical [`RetrievalResult`] instances. This is the property that
//! lets agent replay and regression tests work.
//!
//! # Example
//!
//! ```no_run
//! # use mnem_core::repo::ReadonlyRepo;
//! # fn demo(repo: &ReadonlyRepo, embedding: Vec<f32>) -> Result<(), Box<dyn std::error::Error>> {
//! let result = repo
//! .retrieve()
//! .label("Document")
//! .vector("openai:text-embedding-3-small", embedding)
//! .token_budget(2000)
//! .execute()?;
//!
//! println!(
//! "packed {} nodes in {}/{} tokens, {} dropped",
//! result.items.len(),
//! result.tokens_used,
//! result.tokens_budget,
//! result.dropped,
//! );
//! for item in &result.items {
//! println!("{}", item.rendered);
//! }
//! # Ok(()) }
//! ```
pub use ;
pub use ;
pub use Retriever;
pub use ;
pub use ;
use Write as _;
use Ipld;
use crateNode;
// ============================================================
// Token estimation
// ============================================================
/// A byte/char counter that approximates an LLM tokenizer.
///
/// Implementations must be deterministic pure functions of their input.
/// The default [`HeuristicEstimator`] covers the common case without
/// pulling in tokenizer dependencies; agents that need exact OpenAI /
/// Anthropic / Llama counts plug in their own impl.
/// Byte / character heuristic tuned for modern LLM tokenizers.
///
/// - ASCII bytes are counted as `ceil(bytes / 4)` - the OpenAI rule of
/// thumb, accurate within ~20% for English prose under `cl100k_base`
/// and `o200k_base` tokenizers.
/// - Non-ASCII characters (Unicode scalars outside `[0x00, 0x7F]`) are
/// counted as `ceil(chars / 1.5)` - roughly one token per CJK glyph
/// and two per emoji or Arabic/Cyrillic run, again within ~25% of
/// actual tokenizer output.
///
/// The two contributions are summed. Good enough for budget packing;
/// swap in a real tokenizer for exact accounting.
;
// ============================================================
// Node rendering
// ============================================================
/// Character cap applied to `summary` + `context_sentence` in
/// `render_node`. An unbounded summary silently consumed the entire
/// token budget on a single oversized node, producing zero-recall
/// retrieves; capping the render-time string protects the budget
/// packer without losing the underlying data on the node itself.
///
/// Override with `MNEM_RENDER_SUMMARY_CAP_CHARS`. Measured in
/// `char` count (Unicode scalars) so multi-byte scripts don't hit
/// byte-boundary panics.
pub const DEFAULT_RENDER_SUMMARY_CAP_CHARS: usize = 8192;
/// Truncate to at most `cap` chars. Does NOT split a Unicode scalar
/// mid-sequence because `.chars()` iterates by scalar. Appends a
/// trailing `" <...+N chars>"` marker when truncation happens so a
/// downstream LLM can see the chunk was clipped.
/// Render a [`Node`] to a compact, deterministic text representation
/// suitable for LLM consumption.
///
/// The format is YAML-like and stable across versions:
///
/// ```text
/// ntype: <ntype>
/// id: <uuid>
/// context: <context_sentence>
/// summary: <summary>
/// <prop_key>: <prop_value>
/// ...
/// ```
///
/// - `ntype` and `id` are always present.
/// - `context` is emitted iff `node.context_sentence` is `Some`. Sits
/// BEFORE `summary` so an LLM reading the rendered node sees the
/// chunk's positional cue first (, Anthropic 2024
/// contextual-retrieval recipe).
/// - `summary` is emitted iff `node.summary` is `Some`. Clipped at
/// [`DEFAULT_RENDER_SUMMARY_CAP_CHARS`] (8192) chars by default,
/// overridable via `MNEM_RENDER_SUMMARY_CAP_CHARS`. A 1 MiB
/// summary on a single node would otherwise consume the entire
/// token budget and starve every other item out of the result.
/// - Scalar props (`String`, `Integer`, `Float`, `Bool`) are emitted in
/// BTreeMap order (alphabetical). Non-scalar props (`Link`, `Map`,
/// `List`, `Bytes`, `Null`) are skipped - an agent chasing a link
/// should follow it with a separate `mnem_get_node` call.
/// - Opaque `content` bytes are never rendered.
///
/// Determinism: since `Node.props` is a `BTreeMap`, iteration order is
/// byte-stable and the rendered string is therefore also byte-stable.
/// Like [`render_node`] but augments the output with two graph
/// adjacency blocks derived from `repo`'s current commit:
///
/// ```text
/// ntype: Doc
/// id: ...
/// summary: ...
/// Outgoing:
/// tagged -> <topic_id>
/// Incoming:
/// authored <- <alice_id>
/// ```
///
/// Each block is capped at `per_direction_cap` entries; if the
/// bucket had more, a trailing `... (+N more)` line is emitted so
/// the reader knows the display was clipped. Blocks with no edges
/// are omitted entirely. Entry order is the adjacency bucket's
/// stored order (SPEC ยง4.9 `(label, src|dst, edge_cid)` sort),
/// so the rendered string is byte-stable.
///
/// Use this from CLI / MCP / agent-facing paths that benefit from
/// showing surrounding graph context. The hot `Retriever::execute`
/// render path keeps calling the cheaper [`render_node`] because
/// adjacency-aware rendering there would add a per-item O(log n)
/// bucket fetch for every ranked candidate, and callers that do
/// not need the blocks should not pay the cost.