Expand description
Semantic chunking based on embedding similarity Semantic Chunking for RAG
This module implements semantic chunking that splits text based on semantic similarity rather than fixed character/token counts.
Key innovation: Uses sentence embeddings and cosine similarity to determine natural breakpoints, creating semantically cohesive chunks.
Reference: LangChain SemanticChunker, Greg Kamradt’s 5 Levels of Text Splitting
Structs§
- Semantic
Chunk - Chunk of semantically similar sentences
- Semantic
Chunker - Semantic text chunker that splits based on embedding similarity
- Semantic
Chunker Config - Configuration for semantic chunking
Enums§
- Breakpoint
Strategy - Strategy for determining chunk breakpoints