1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
//! `Chunker` — async transform over a chunk sequence.
//!
//! Where [`TextSplitter`](crate::TextSplitter) is a pure-algorithm
//! slicer, `Chunker` is an async transform that may issue model
//! calls. The canonical use cases:
//!
//! - **Anthropic Contextual Retrieval (2024-09)** — for each
//! chunk, generate a 50-100 token contextual prefix grounded in
//! the parent document, prepend it to the chunk text. Increases
//! retrieval accuracy ~30%; requires one LLM call per chunk.
//! - **HyDE (Hypothetical Document Embeddings)** — replace the
//! chunk text with a generated answer-shaped paraphrase before
//! embedding. Improves retrieval over short queries.
//! - **Chunk metadata enrichment** — extract entities / topics /
//! sentiment from the chunk and stamp them onto
//! [`Document::metadata`](crate::Document::metadata) for
//! downstream filtering.
//!
//! Chunkers run *after* splitting and *before* embedding. The
//! transformed sequence may be **shorter than the input** — a
//! chunker may drop chunks that fail enrichment (see
//! [`ContextualChunker`]'s [`FailurePolicy::Skip`]) or that fail a
//! filter pass. Order is preserved: when chunk N survives, it
//! appears in the output before any surviving chunk M > N. A
//! chunker that wants to fan one chunk out into several (rare)
//! belongs on a fresh splitter, not this surface.
pub use ;
use async_trait;
use ;
use crateDocument;
/// Async transform applied to a sequence of chunks after a
/// [`TextSplitter`](crate::TextSplitter) ran. Implementations may
/// issue LLM calls, embedding lookups, or external metadata
/// enrichment; the [`ExecutionContext`] supplies cancellation,
/// deadline, and any [`entelix_core::RunBudget`] caps the parent
/// pipeline configured.
///
/// Stamps the chunker's identity onto every transformed chunk's
/// [`Lineage::chunker_chain`](crate::Lineage::chunker_chain) so
/// the audit trail records the order of transforms a leaf
/// underwent.