Skip to main content

Module semantic_chunking

Module semantic_chunking 

Source
Expand description

Semantic chunking based on embedding similarity Semantic Chunking for RAG

This module implements semantic chunking that splits text based on semantic similarity rather than fixed character/token counts.

Key innovation: Uses sentence embeddings and cosine similarity to determine natural breakpoints, creating semantically cohesive chunks.

Reference: LangChain SemanticChunker, Greg Kamradt’s 5 Levels of Text Splitting

Structs§

SemanticChunk
Chunk of semantically similar sentences
SemanticChunker
Semantic text chunker that splits based on embedding similarity
SemanticChunkerConfig
Configuration for semantic chunking

Enums§

BreakpointStrategy
Strategy for determining chunk breakpoints