Open Code Search Parser
opencodesearchparser is a Rust library for parsing source files into top-level code segments using Tree-sitter.
Public API
;
;
;
thread_num == 0 is treated as 1 thread internally.
Current Language Support
| Area | Supported now |
|---|---|
parse_str / parse_file parsing |
C, Cpp, Python, JavaScript, Js, Rust |
parse_dir extension filtering |
C (.c), Cpp (.cpp), Python (.py), JavaScript/Js (.js), Rust (.rs) |
Other CodeLanguage variants |
Present in the enum, but currently return an error in parsing and/or directory mapping |
Segmentation Behavior (Current)
- C/C++: keeps top-level functions, declarations, struct/class/enum/union/type definitions, and preprocessor nodes (
include,define, macro function define, conditional directives, and preprocessor calls like#pragma). - C/C++ struct/class/enum/union declarations are emitted with trailing
;when it is a separate sibling node. - Python: keeps top-level function definitions, class definitions, expression statements, assignments, and global statements.
- JavaScript/Js: keeps top-level function declarations, class declarations, lexical/variable declarations, and expression statements.
- Rust: keeps top-level nodes whose kinds end with
_itemor_definition. - Top-level comment nodes and empty/whitespace-only segments are skipped.
Parallelism
parse_struses a Rayon thread pool (thread_num) for segment materialization.parse_filereads one file, then callsparse_strwith the samethread_num.parse_dirwalks directories withwalkdir, filters by extension, then parses matching files in parallel with Rayon.
RecursiveCharacterTextSplitter
recursive_character_text_splitter::RecursiveCharacterTextSplitter provides recursive chunking with configurable separators, chunk size, and overlap.
Key constructors:
;
;
;
Usage Examples
Parse a string
use Result;
use ;
Parse a file
use Result;
use ;
Parse a directory
use Result;
use ;
Use the recursive splitter
use RecursiveCharacterTextSplitter;
use CodeLanguage;
let splitter = from_language;
let chunks = splitter.split_text;
assert!;