Skip to main content

Crate julienne

Crate julienne 

Source
Expand description

Julienne is a Rust library for cutting text into range-preserving chunks.

It provides simple separator splitters, recursive and sentence-aware splitters, semantic chunking, token-window chunking, and structure-aware chunkers for Markdown, HTML/XML, and optional tree-sitter-backed code input.

Structured chunk APIs return TextChunk values whose text field is a zero-copy slice of the original input. The offset invariant for every structured chunk is:

&input[chunk.start_byte..chunk.end_byte] == chunk.text

Iterator APIs named chunks stream structured chunks where the algorithm can operate incrementally. split_chunks collects those chunks, and split_text projects them into owned strings for convenience.

§Quick start

use julienne::SemchunkSplitter;

let splitter = SemchunkSplitter::new(200, 40);
let chunks = splitter.split_text("Julienne keeps chunking small, explicit, and provenance-safe.");
assert!(!chunks.is_empty());

Re-exports§

pub use character::CharacterTextSplitter;
pub use chunk::ChunkMetadata;
pub use chunk::TextChunk;
pub use chunk::TextChunkIter;
pub use error::ChunkError;
pub use recursive::RecursiveCharacterTextSplitter;
pub use semantic::SemanticChunker;
pub use semchunk::SemchunkSplitter;
pub use sentence::SentenceChunker;
pub use sizing::ByteSizer;
pub use sizing::CharSizer;
pub use sizing::ChunkConfig;
pub use sizing::ChunkSizer;
pub use sizing::FunctionSizer;
pub use sizing::WordSizer;
pub use split::KeepSeparator;
pub use structure::HtmlChunker;
pub use structure::MarkdownChunker;
pub use structure::XmlChunker;
pub use token::TokenBoundaryProvider;
pub use token::TokenChunker;
pub use token::TokenSpan;

Modules§

character
chunk
error
merge
recursive
semantic
semchunk
sentence
sizing
split
structure
token

Traits§

Embedder

Functions§

char_len
Default length function: counts Unicode characters.

Type Aliases§

EmbedderHandle
EmbeddingFn
LengthFn
A custom length function for text splitting (e.g. token counting).