recoco-splitters 0.2.2

Text splitters/parsers for Recoco, an all-Rust fork of CocoIndex with greater flexibility.
Documentation

Recoco Splitters

Intelligent text splitting and parsing for Recoco.

This crate implements sophisticated text splitting strategies, primarily leveraging Tree-sitter to perform syntax-aware chunking of source code and structured documents.

🚀 Why Tree-sitter?

Standard text splitters often break code in the middle of functions or classes, destroying context. recoco-splitters understands the syntax of the language it is processing, ensuring that chunks respect logical boundaries (e.g., keeping a whole function together).

📦 Supported Languages

To minimize binary size, Recoco feature-gates every language parser. Enable only what you need in your Cargo.toml.

[dependencies]
recoco-splitters = { version = "...", features = ["python", "rust"] }
Feature Language
} all all languages
c C
c-sharp C#
cpp C++
css CSS
fortran Fortran
go Go
html HTML
java Java
javascript JavaScript
json JSON
kotlin Kotlin
markdown Markdown
php PHP
python Python
r R
ruby Ruby
rust Rust
scala Scala
solidity Solidity
sql SQL
swift Swift
toml TOML
typescript TypeScript
xml XML
yaml YAML

🧩 Splitter Strategies

  • Recursive Character Splitter: Standard splitting by separators (paragraphs, newlines, etc.).
  • Recursive Syntax Splitter: Tree-sitter based splitting that respects code blocks and syntax nodes.

📄 License

Apache-2.0. See main repository for details.