Expand description
§chunkedrs
AI-native text chunking — split long documents into token-accurate pieces for embedding and retrieval. Built on tiktoken for precise token counting.
§Design: 用就要好用
Three strategies, each done right:
| Strategy | Use case | Speed |
|---|---|---|
| Recursive (default) | General text — paragraphs, sentences, words | Fastest |
| Markdown | Documents with # headers — preserves section metadata | Fast |
| Semantic | High-quality RAG — splits at meaning boundaries via embeddings | Slower (API calls) |
§Quick start
// split with defaults: recursive, 512 max tokens, no overlap
let chunks = chunkedrs::chunk("your long text here...").split();
for chunk in &chunks {
println!("[{}] {} tokens", chunk.index, chunk.token_count);
}§Token-accurate splitting
let chunks = chunkedrs::chunk("your long text here...")
.max_tokens(256)
.overlap(50)
.model("gpt-4o")
.split();
// every chunk is guaranteed to have <= 256 tokens
assert!(chunks.iter().all(|c| c.token_count <= 256));§Markdown-aware splitting
let markdown = "# Intro\n\nSome text.\n\n## Details\n\nMore text here.\n";
let chunks = chunkedrs::chunk(markdown).markdown().split();
// each chunk knows which section it belongs to
assert_eq!(chunks[0].section.as_deref(), Some("# Intro"));§Semantic splitting
With the semantic feature enabled, split at meaning boundaries using embeddings:
ⓘ
let client = embedrs::openai("sk-...");
let chunks = chunkedrs::chunk("your long text here...")
.semantic(&client)
.split_async()
.await?;Structs§
- Chunk
- A piece of text produced by splitting a larger document.
- Chunk
Builder - Builder for configuring text chunking.
Enums§
- Error
- Error types for chunkedrs operations.
Functions§
- chunk
- Create a chunk builder for the given text.
Type Aliases§
- Result
- Result type for chunkedrs operations.