recoco-splitters 0.2.0

Text splitters/parsers for ReCoco, an all-Rust fork of CocoIndex with greater flexibility.
Documentation

ReCoco Splitters

Intelligent text splitting and parsing for ReCoco.

This crate implements sophisticated text splitting strategies, primarily leveraging Tree-sitter to perform syntax-aware chunking of source code and structured documents.

🚀 Why Tree-sitter?

Standard text splitters often break code in the middle of functions or classes, destroying context. recoco-splitters understands the syntax of the language it is processing, ensuring that chunks respect logical boundaries (e.g., keeping a whole function together).

📦 Supported Languages

To minimize binary size, every language parser is feature-gated. Enable only what you need in your Cargo.toml.

[dependencies]
recoco-splitters = { version = "...", features = ["python", "rust"] }
Feature Language
c C
c-sharp C#
cpp C++
css CSS
go Go
html HTML
java Java
javascript JavaScript
json JSON
kotlin Kotlin
markdown Markdown
php PHP
python Python
ruby Ruby
rust Rust
sql SQL
typescript TypeScript
yaml YAML
... (See Cargo.toml for full list)

🧩 Splitter Strategies

  • Recursive Character Splitter: Standard splitting by separators (paragraphs, newlines, etc.).
  • Recursive Syntax Splitter: Tree-sitter based splitting that respects code blocks and syntax nodes.

📄 License

Apache-2.0. See main repository for details.