pub struct LateChunkingStrategy { /* private fields */ }Expand description
Context-aware chunking strategy for use with late-chunking embedding models
Splits text using HierarchicalChunker and records precise byte-offset
spans in each chunk’s metadata. A late-chunking embedding provider
(Jina API or a local candle model) can then use these spans to extract
per-chunk representations from a single full-document forward pass.
§Examples
use graphrag_core::text::late_chunking::{LateChunkingStrategy, LateChunkingConfig};
use graphrag_core::core::{ChunkingStrategy, DocumentId};
let strategy = LateChunkingStrategy::with_defaults(DocumentId::new("doc-1".to_string()));
let chunks = strategy.chunk("First paragraph.\n\nSecond paragraph.");
for chunk in &chunks {
// position_in_document ∈ [0.0, 1.0] — used by embedding provider for pooling
assert!(chunk.metadata.position_in_document.is_some());
}Implementations§
Source§impl LateChunkingStrategy
impl LateChunkingStrategy
Sourcepub fn new(config: LateChunkingConfig, document_id: DocumentId) -> Self
pub fn new(config: LateChunkingConfig, document_id: DocumentId) -> Self
Create a new late chunking strategy with explicit config
Sourcepub fn with_defaults(document_id: DocumentId) -> Self
pub fn with_defaults(document_id: DocumentId) -> Self
Create with default config (8192 token limit, 512-char chunks)
Sourcepub fn with_max_doc_tokens(self, max_tokens: u32) -> Self
pub fn with_max_doc_tokens(self, max_tokens: u32) -> Self
Set the maximum document token limit (choose based on embedding model)
8192→ Jina v332768→ gte-Qwen2-7B-instruct
Sourcepub fn estimate_tokens(text: &str) -> u32
pub fn estimate_tokens(text: &str) -> u32
Estimate token count from character count (1 token ≈ 4 chars)
Sourcepub fn fits_in_context(&self, text: &str) -> bool
pub fn fits_in_context(&self, text: &str) -> bool
Returns true if the document fits within the model’s context window
Sourcepub fn split_into_sections(&self, text: &str) -> Vec<String>
pub fn split_into_sections(&self, text: &str) -> Vec<String>
Split an oversized document into sections that fit within the context window
Sections are formed by grouping paragraphs (double-newline boundaries) until the next paragraph would exceed the limit. Each section can be embedded independently with late chunking applied within it.
Trait Implementations§
Auto Trait Implementations§
impl Freeze for LateChunkingStrategy
impl RefUnwindSafe for LateChunkingStrategy
impl Send for LateChunkingStrategy
impl Sync for LateChunkingStrategy
impl Unpin for LateChunkingStrategy
impl UnsafeUnpin for LateChunkingStrategy
impl UnwindSafe for LateChunkingStrategy
Blanket Implementations§
Source§impl<T> BorrowMut<T> for Twhere
T: ?Sized,
impl<T> BorrowMut<T> for Twhere
T: ?Sized,
Source§fn borrow_mut(&mut self) -> &mut T
fn borrow_mut(&mut self) -> &mut T
Source§impl<T> Instrument for T
impl<T> Instrument for T
Source§fn instrument(self, span: Span) -> Instrumented<Self>
fn instrument(self, span: Span) -> Instrumented<Self>
Source§fn in_current_span(self) -> Instrumented<Self>
fn in_current_span(self) -> Instrumented<Self>
Source§impl<T> IntoEither for T
impl<T> IntoEither for T
Source§fn into_either(self, into_left: bool) -> Either<Self, Self>
fn into_either(self, into_left: bool) -> Either<Self, Self>
self into a Left variant of Either<Self, Self>
if into_left is true.
Converts self into a Right variant of Either<Self, Self>
otherwise. Read moreSource§fn into_either_with<F>(self, into_left: F) -> Either<Self, Self>
fn into_either_with<F>(self, into_left: F) -> Either<Self, Self>
self into a Left variant of Either<Self, Self>
if into_left(&self) returns true.
Converts self into a Right variant of Either<Self, Self>
otherwise. Read more