Skip to main content

Chunker

Trait Chunker 

Source
pub trait Chunker: Send + Sync {
    // Required methods
    fn chunk(
        &self,
        buffer_id: i64,
        text: &str,
        metadata: Option<&ChunkMetadata>,
    ) -> Result<Vec<Chunk>>;
    fn name(&self) -> &'static str;

    // Provided methods
    fn supports_parallel(&self) -> bool { ... }
    fn description(&self) -> &'static str { ... }
    fn validate(&self, metadata: Option<&ChunkMetadata>) -> Result<()> { ... }
}
Expand description

Trait for chunking text into processable segments.

Implementations must be Send + Sync to support parallel processing. Each chunker should produce consistent, deterministic output for the same input.

§Examples

use rlm_rs::chunking::{Chunker, FixedChunker};

let chunker = FixedChunker::with_size(100);
let text = "Hello, world! ".repeat(20);
let chunks = chunker.chunk(1, &text, None).unwrap();
assert!(!chunks.is_empty());

Required Methods§

Source

fn chunk( &self, buffer_id: i64, text: &str, metadata: Option<&ChunkMetadata>, ) -> Result<Vec<Chunk>>

Chunks the input text into segments.

§Arguments
  • buffer_id - ID of the source buffer.
  • text - The input text to chunk.
  • metadata - Optional metadata for context-aware chunking.
§Returns

A vector of chunks with byte offsets and metadata.

§Errors

Returns an error if chunking fails (e.g., invalid configuration).

Source

fn name(&self) -> &'static str

Returns the name of the chunking strategy.

Provided Methods§

Source

fn supports_parallel(&self) -> bool

Returns whether this chunker supports parallel processing.

Default is false. Chunkers that benefit from parallelization should override this to return true.

Source

fn description(&self) -> &'static str

Returns a description of the chunking strategy.

Source

fn validate(&self, metadata: Option<&ChunkMetadata>) -> Result<()>

Validates configuration before chunking.

§Arguments
  • metadata - Optional metadata to validate.
§Returns

Ok(()) if configuration is valid, error otherwise.

§Errors

Returns an error if chunk size is zero or overlap exceeds chunk size.

Implementors§