# Getting Started
Julienne cuts already-extracted text into chunks. It is useful when you need
stable input ranges for retrieval, embeddings, indexing, search snippets, or
context windows.
## Install
```toml
[dependencies]
julienne = "0.1"
```
Enable optional integrations only when you need them:
```toml
[dependencies]
julienne = { version = "0.1", features = ["unicode-segmentation", "tiktoken-rs"] }
```
## Basic Usage
`SemchunkSplitter` is the recommended general-purpose starting point for prose
and mixed natural-language text.
```rust
use julienne::SemchunkSplitter;
let input = "Intro paragraph. More detail follows. Final note.";
let splitter = SemchunkSplitter::new(120, 20);
let chunks = splitter.split_text(input);
assert!(!chunks.is_empty());
```
Use `split_text` when owned strings are enough. Use `split_chunks` when
downstream code needs the original source offsets.
```rust
use julienne::{RecursiveCharacterTextSplitter, TextChunk};
let input = "Intro.\n\nDetails with cafe.";
let splitter = RecursiveCharacterTextSplitter::new(80, 10);
let chunks: Vec<TextChunk<'_>> = splitter.split_chunks(input);
for chunk in chunks {
assert_eq!(&input[chunk.start_byte..chunk.end_byte], chunk.text);
}
```
## Choose A Chunker
Start here:
- `SemchunkSplitter`: general-purpose prose and mixed text.
- `RecursiveCharacterTextSplitter`: predictable LangChain-style separator
fallback.
- `SentenceChunker`: sentence boundaries are more important than separator
hierarchy.
- `SemanticChunker`: you have a domain-relevant embedder and want topic-shift
boundaries.
- `MarkdownChunker`: Markdown block structure matters.
- `HtmlChunker` / `XmlChunker`: already-extracted markup strings carry useful
block structure.
- `CodeChunker`: Rust or Python source should be split by AST nodes.
- `TokenChunker`: fixed token windows are the desired unit.
See [Chunker guide](chunkers.md) for tradeoffs and examples.
## Validate Locally
```bash
prek run --all-files
```
The canonical gate checks formatting, feature combinations, tests, clippy,
dependency policy, docs, and package verification.