# Chunker Guide
Julienne exposes several chunking strategies because no single boundary model is
right for every input type.
## `CharacterTextSplitter`
Splits on one configured separator, then merges pieces up to `chunk_size` with
optional overlap.
Use it when boundaries are simple and predictable, for example newline-delimited
records or paragraphs already normalized by an upstream process.
```rust
use julienne::CharacterTextSplitter;
let splitter = CharacterTextSplitter::new("\n", 200, 20);
let chunks = splitter.split_text("alpha\nbeta\ngamma");
```
## `RecursiveCharacterTextSplitter`
Tries a separator hierarchy from coarse to fine, then falls back to smaller
units when needed.
Use it when you want LangChain-style behavior and the input may contain mixed
paragraph, line, and word boundaries.
```rust
use julienne::RecursiveCharacterTextSplitter;
let splitter = RecursiveCharacterTextSplitter::new(500, 50);
let chunks = splitter.split_text("First paragraph.\n\nSecond paragraph.");
```
## `SentenceChunker`
Builds chunks from sentence-like units and preserves sentence boundaries where
possible.
Use it when splitting through a sentence would be worse than producing slightly
different packing than a separator-based splitter.
```rust
use julienne::SentenceChunker;
let splitter = SentenceChunker::new(300, 30);
let chunks = splitter.split_text("One sentence. Another sentence. A final sentence.");
```
## `SemchunkSplitter`
Uses a punctuation-aware delimiter hierarchy and adaptive packing. This is the
recommended default for prose and mixed natural-language text.
```rust
use julienne::SemchunkSplitter;
let splitter = SemchunkSplitter::new(500, 50);
let chunks = splitter.split_text("A paragraph, with clauses; and useful punctuation.");
```
## `SemanticChunker`
Detects topic boundaries with embeddings. Configure a batch embedder for
production use. Without an embedder, it falls back to sentence-based packing.
```rust
use julienne::SemanticChunker;
let chunker = SemanticChunker::builder()
.chunk_size(500)
.chunk_overlap(50)
.embedding_fn(std::sync::Arc::new(|text: &str| vec![text.len() as f32]))
.build()
.unwrap();
let chunks = chunker.split_text("Topic A. Topic B.");
```
Provider-backed embeddings can fail. Use `try_split_text` or
`try_split_chunks` when those failures should be returned as `ChunkError`
instead of panicking through a convenience API.
## Structure-Aware Chunkers
Use `MarkdownChunker`, `HtmlChunker`, `XmlChunker`, and feature-gated
`CodeChunker` when syntax or document structure is a better boundary source than
plain punctuation.
See [Structure-aware chunking](structure-aware-chunking.md).
## `TokenChunker`
Uses an explicit token-boundary provider and returns fixed token windows with
overlap.
Use it when the unit must be token count rather than character, word, or
semantic boundaries.
See [Sizing and token windows](sizing-and-tokens.md).