Skip to main content

Module chunking

Module chunking 

Source
Expand description

Chunking strategies for RLM-RS.

This module provides a trait-based system for chunking text content into processable segments. Multiple strategies are available:

  • Fixed: Simple character-based chunking with configurable size and overlap
  • Semantic: Unicode-aware chunking respecting sentence/paragraph boundaries
  • Code: Language-aware chunking at function/class boundaries
  • Parallel: Orchestrator for parallel chunk processing

Re-exports§

pub use code::CodeChunker;
pub use fixed::FixedChunker;
pub use parallel::ParallelChunker;
pub use semantic::SemanticChunker;
pub use traits::ChunkMetadata as ChunkerMetadata;
pub use traits::Chunker;

Modules§

code
Code-aware chunking strategy.
fixed
Fixed-size chunking strategy.
parallel
Parallel chunking orchestrator.
semantic
Semantic chunking strategy.
traits
Chunker trait definition.

Constants§

DEFAULT_CHUNK_SIZE
Default chunk size in characters (~750 tokens at 4 chars/token). Sized for granular semantic search with embeddings.
DEFAULT_OVERLAP
Default overlap size in characters (for context continuity).
MAX_CHUNK_SIZE
Maximum allowed chunk size (50k chars, ~12.5k tokens).

Functions§

available_strategies
Lists available chunking strategy names.
create_chunker
Creates a chunker by name.
default_chunker
Creates the default chunker (semantic).