pub struct LateChunkingPooler { /* private fields */ }Expand description
Late chunking pooler: pools token embeddings into chunk embeddings.
This is the core operation of late chunking. Given token-level embeddings from a full document, it pools the tokens within each chunk boundary to create contextualized chunk embeddings.
Implementations§
Source§impl LateChunkingPooler
impl LateChunkingPooler
Sourcepub fn new(dim: usize) -> Self
pub fn new(dim: usize) -> Self
Create a new late chunking pooler.
§Arguments
dim- Embedding dimension (e.g., 384 for all-MiniLM-L6-v2)
Sourcepub fn pool(
&self,
token_embeddings: &[Vec<f32>],
chunks: &[Slab],
doc_len: usize,
) -> Vec<Vec<f32>>
pub fn pool( &self, token_embeddings: &[Vec<f32>], chunks: &[Slab], doc_len: usize, ) -> Vec<Vec<f32>>
Pool token embeddings into chunk embeddings.
§Arguments
token_embeddings- Token-level embeddings from full document. Shape: [n_tokens, dim]. Each token has “seen” the full document.chunks- Chunk boundaries from any chunker.doc_len- Total document length in bytes (for mapping).
§Returns
Contextualized chunk embeddings. Each chunk embedding is the mean of its constituent token embeddings.
§Panics
Panics if token embeddings have inconsistent dimensions.
Sourcepub fn pool_with_offsets(
&self,
token_embeddings: &[Vec<f32>],
token_offsets: &[(usize, usize)],
chunks: &[Slab],
) -> Vec<Vec<f32>>
pub fn pool_with_offsets( &self, token_embeddings: &[Vec<f32>], token_offsets: &[(usize, usize)], chunks: &[Slab], ) -> Vec<Vec<f32>>
Pool with exact token-to-character mappings.
Use this when you have exact token offsets from the tokenizer, rather than relying on linear approximation.
§Arguments
token_embeddings- Token-level embeddings [n_tokens, dim].token_offsets- Character offset for each token [(start, end), …].chunks- Chunk boundaries.
Trait Implementations§
Source§impl Clone for LateChunkingPooler
impl Clone for LateChunkingPooler
Source§fn clone(&self) -> LateChunkingPooler
fn clone(&self) -> LateChunkingPooler
Returns a duplicate of the value. Read more
1.0.0 · Source§fn clone_from(&mut self, source: &Self)
fn clone_from(&mut self, source: &Self)
Performs copy-assignment from
source. Read moreAuto Trait Implementations§
impl Freeze for LateChunkingPooler
impl RefUnwindSafe for LateChunkingPooler
impl Send for LateChunkingPooler
impl Sync for LateChunkingPooler
impl Unpin for LateChunkingPooler
impl UnsafeUnpin for LateChunkingPooler
impl UnwindSafe for LateChunkingPooler
Blanket Implementations§
Source§impl<T> BorrowMut<T> for Twhere
T: ?Sized,
impl<T> BorrowMut<T> for Twhere
T: ?Sized,
Source§fn borrow_mut(&mut self) -> &mut T
fn borrow_mut(&mut self) -> &mut T
Mutably borrows from an owned value. Read more