Skip to main content

Crate chunkedrs

Crate chunkedrs 

Source
Expand description

§chunkedrs

AI-native text chunking — split long documents into token-accurate pieces for embedding and retrieval. Built on tiktoken for precise token counting.

§Design: 用就要好用

Three strategies, each done right:

StrategyUse caseSpeed
Recursive (default)General text — paragraphs, sentences, wordsFastest
MarkdownDocuments with # headers — preserves section metadataFast
SemanticHigh-quality RAG — splits at meaning boundaries via embeddingsSlower (API calls)

§Quick start

// split with defaults: recursive, 512 max tokens, no overlap
let chunks = chunkedrs::chunk("your long text here...").split();
for chunk in &chunks {
    println!("[{}] {} tokens", chunk.index, chunk.token_count);
}

§Token-accurate splitting

let chunks = chunkedrs::chunk("your long text here...")
    .max_tokens(256)
    .overlap(50)
    .model("gpt-4o")
    .split();

// every chunk is guaranteed to have <= 256 tokens
assert!(chunks.iter().all(|c| c.token_count <= 256));

§Markdown-aware splitting

let markdown = "# Intro\n\nSome text.\n\n## Details\n\nMore text here.\n";
let chunks = chunkedrs::chunk(markdown).markdown().split();

// each chunk knows which section it belongs to
assert_eq!(chunks[0].section.as_deref(), Some("# Intro"));

§Semantic splitting

With the semantic feature enabled, split at meaning boundaries using embeddings:

let client = embedrs::openai("sk-...");
let chunks = chunkedrs::chunk("your long text here...")
    .semantic(&client)
    .split_async()
    .await?;

Structs§

Chunk
A piece of text produced by splitting a larger document.
ChunkBuilder
Builder for configuring text chunking.

Enums§

Error
Error types for chunkedrs operations.

Functions§

chunk
Create a chunk builder for the given text.

Type Aliases§

Result
Result type for chunkedrs operations.