Skip to main content

Module text

Module text 

Source
Expand description

Text processing and chunking

Modules§

analysis
Text analysis utilities Text analysis utilities for document structure detection
boundary_detection
Semantic boundary detection for BAR-RAG Semantic Boundary Detection for Boundary-Aware Chunking
chunk_enricher
Chunk enrichment pipeline Chunk enrichment pipeline
chunking
Text chunking utilities module
chunking_strategies
Trait-based chunking strategies Trait-based chunking strategy implementations
contextual_enricher
LLM-based contextual chunk enrichment (Anthropic Contextual Retrieval pattern) Contextual Chunk Enrichment via LLM (Anthropic Contextual Retrieval pattern)
document_structure
Document structure representation Document structure representation for hierarchical parsing
extractive_summarizer
Extractive summarization Real extractive summarization with sentence ranking
keyword_extraction
TF-IDF keyword extraction Real TF-IDF keyword extraction
late_chunking
Late Chunking for context-preserving embeddings (Jina AI technique) Late Chunking — context-preserving embeddings for RAG
layout_parser
Layout parser trait Layout parser trait and factory for document structure detection
parsers
Document layout parsers Document layout parsers
semantic_chunking
Semantic chunking based on embedding similarity Semantic Chunking for RAG
semantic_coherence
Semantic coherence scoring for BAR-RAG Semantic Coherence Scoring for Boundary-Aware Chunking

Structs§

Boundary
Represents a detected boundary in text
BoundaryAwareChunkingStrategy
Boundary-Aware Chunking Strategy (BAR-RAG)
BoundaryDetectionConfig
Configuration for boundary detection
BoundaryDetector
Boundary detector for semantic text segmentation
ChunkEnricher
Chunk enricher that adds semantic metadata to text chunks
CoherenceConfig
Configuration for semantic coherence scoring
ContextualEnricher
LLM-based contextual chunk enricher (Anthropic Contextual Retrieval pattern)
ContextualEnricherConfig
Configuration for contextual chunk enrichment
DocumentStructure
Complete document structure with headings and sections
EnrichmentStatistics
Statistics about chunk enrichment
ExtractiveSummarizer
Extractive summarizer using sentence scoring
Heading
A heading in a document (e.g., chapter, section, subsection)
HeadingHierarchy
Hierarchical structure of a document
HierarchicalChunkingStrategy
Hierarchical chunking strategy wrapper
JinaLateChunkingClient
Jina AI embeddings client with native late chunking support
LanguageDetector
Language detection utilities
LateChunkingConfig
Configuration for the late chunking strategy
LateChunkingStrategy
Context-aware chunking strategy for use with late-chunking embedding models
LayoutParserFactory
Factory for creating layout parsers based on document type
OptimalSplit
Result of split-point optimization
ScoredChunk
Represents a candidate chunk with coherence score
Section
A section in a document, defined by a heading and its content range
SectionNumber
Parsed section number with format information
SemanticChunk
Chunk of semantically similar sentences
SemanticChunker
Semantic text chunker that splits based on embedding similarity
SemanticChunkerConfig
Configuration for semantic chunking
SemanticChunkingStrategy
Semantic chunking strategy wrapper
SemanticCoherenceScorer
Semantic coherence scorer using sentence embeddings
StructureStatistics
Statistics about document structure
TextAnalyzer
Text analyzer for structural analysis
TextProcessor
Text processing utilities for chunking and preprocessing
TextStats
Text statistics
TfIdfKeywordExtractor
TF-IDF based keyword extractor

Enums§

BoundaryType
Type of boundary detected
BreakpointStrategy
Strategy for determining chunk breakpoints
SectionNumberFormat
Section numbering format (e.g., “1.2.3”, “Chapter 1”, “I.A.1”)

Traits§

LayoutParser
Trait for document layout parsers