Crate chunkedrs

Expand description

§chunkedrs

AI-native text chunking — split long documents into token-accurate pieces for embedding and retrieval. Built on tiktoken for precise token counting.

§Design: 用就要好用

Three strategies, each done right:

Strategy	Use case	Speed
Recursive (default)	General text — paragraphs, sentences, words	Fastest
Markdown	Documents with `#` headers — preserves section metadata	Fast
Semantic	High-quality RAG — splits at meaning boundaries via embeddings	Slower (API calls)

§Quick start

// split with defaults: recursive, 512 max tokens, no overlap
let chunks = chunkedrs::chunk("your long text here...").split();
for chunk in &chunks {
    println!("[{}] {} tokens", chunk.index, chunk.token_count);
}

§Token-accurate splitting

let chunks = chunkedrs::chunk("your long text here...")
    .max_tokens(256)
    .overlap(50)
    .model("gpt-4o")
    .split();

// every chunk is guaranteed to have <= 256 tokens
assert!(chunks.iter().all(|c| c.token_count <= 256));

§Markdown-aware splitting

let markdown = "# Intro\n\nSome text.\n\n## Details\n\nMore text here.\n";
let chunks = chunkedrs::chunk(markdown).markdown().split();

// each chunk knows which section it belongs to
assert_eq!(chunks[0].section.as_deref(), Some("# Intro"));

§Semantic splitting

With the semantic feature enabled, split at meaning boundaries using embeddings:

let client = embedrs::openai("sk-...");
let chunks = chunkedrs::chunk("your long text here...")
    .semantic(&client)
    .split_async()
    .await?;

Structs§

Chunk: A piece of text produced by splitting a larger document.
ChunkBuilder: Builder for configuring text chunking.

Enums§

Error: Error types for chunkedrs operations.

Functions§

chunk: Create a chunk builder for the given text.

Type Aliases§

Result: Result type for chunkedrs operations.

Crate chunkedrs

Crate chunkedrs Copy item path

§chunkedrs

§Design: 用就要好用

§Quick start

§Token-accurate splitting

§Markdown-aware splitting

§Semantic splitting

Structs§

Enums§

Functions§

Type Aliases§

Crate chunkedrs