Recoco Splitters
Intelligent text splitting and parsing for Recoco.
This crate implements sophisticated text splitting strategies, primarily leveraging Tree-sitter to perform syntax-aware chunking of source code and structured documents.
🚀 Why Tree-sitter?
Standard text splitters often break code in the middle of functions or classes, destroying context. recoco-splitters understands the syntax of the language it is processing, ensuring that chunks respect logical boundaries (e.g., keeping a whole function together).
📦 Supported Languages
To minimize binary size, Recoco feature-gates every language parser. Enable only what you need in your Cargo.toml.
[]
= { = "...", = ["python", "rust"] }
| Feature | Language |
|---|---|
} all |
all languages |
c |
C |
c-sharp |
C# |
cpp |
C++ |
css |
CSS |
fortran |
Fortran |
go |
Go |
html |
HTML |
java |
Java |
javascript |
JavaScript |
json |
JSON |
kotlin |
Kotlin |
markdown |
Markdown |
php |
PHP |
python |
Python |
r |
R |
ruby |
Ruby |
rust |
Rust |
scala |
Scala |
solidity |
Solidity |
sql |
SQL |
swift |
Swift |
toml |
TOML |
typescript |
TypeScript |
xml |
XML |
yaml |
YAML |
🧩 Splitter Strategies
- Recursive Character Splitter: Standard splitting by separators (paragraphs, newlines, etc.).
- Recursive Syntax Splitter: Tree-sitter based splitting that respects code blocks and syntax nodes.
📄 License
Apache-2.0. See main repository for details.