julienne 0.1.0

Range-preserving Rust text chunkers for retrieval and embedding pipelines
Documentation
# Structure-Aware Chunking

Structure-aware chunkers use syntax or markup boundaries before falling back to
range-preserving merging.

## Markdown

`MarkdownChunker` preserves headings, paragraphs, lists, and fenced code blocks.

```rust
use julienne::MarkdownChunker;

let input = "# Title\n\nA paragraph.\n\n```rust\nfn main() {}\n```";
let chunks = MarkdownChunker::new(500, 50).unwrap().split_chunks(input);
```

## HTML And XML

`HtmlChunker` and `XmlChunker` work on already-extracted markup strings. They
use block-level tag boundaries such as sections, headings, paragraphs, list
items, pre/code blocks, and tables.

```rust
use julienne::HtmlChunker;

let input = "<section><h1>Title</h1><p>Body</p></section>";
let chunks = HtmlChunker::new(500, 50).unwrap().split_text(input);
```

These chunkers do not fetch URLs, sanitize markup, remove boilerplate, or apply
readability extraction.

## Code

`CodeChunker` is available with the `code` feature. It uses tree-sitter parsers
for Rust and Python and chunks by semantic AST units such as functions, methods,
impl blocks, classes, structs, enums, modules, and attached comments.

```rust
use julienne::{CodeChunker, CodeLanguage};

let source = "fn main() { println!(\"hello\"); }";
let chunks = CodeChunker::new(CodeLanguage::Rust, 500, 50)
    .unwrap()
    .try_split_chunks(source)
    .unwrap();
```

Unsupported languages, parse failures, and oversized semantic units return
explicit `ChunkError` values. There is no heuristic fallback that pretends a
parser-backed chunk succeeded.