Expand description
§Tree Parser Library
A comprehensive Rust library for parsing and searching code elements across multiple programming languages using tree-sitter. This library provides powerful tools for static code analysis, code search, and AST manipulation.
§Features
- Multi-language Support: Parse Python, Rust, JavaScript, TypeScript, Java, C, C++, Go, and more
- High Performance: Concurrent parsing with async/await for maximum efficiency
- Advanced Search: Find functions, classes, structs, interfaces with regex pattern matching
- Flexible Filtering: Custom file filters and parsing options
- Rich Metadata: Extract detailed information about code constructs
- Type Safety: Full Rust type safety with comprehensive error handling
- Configurable: Extensive configuration options for different use cases
§Quick Start
use tree_parser::{parse_file, Language};
#[tokio::main]
async fn main() -> Result<(), Box<dyn std::error::Error>> {
// Parse a single file
let parsed_file = parse_file("src/main.rs", Language::Rust).await?;
println!("Found {} constructs", parsed_file.constructs.len());
for construct in &parsed_file.constructs {
if let Some(name) = &construct.name {
println!("{}: {} (lines {}-{})",
construct.node_type, name,
construct.start_line, construct.end_line);
}
}
Ok(())
}§Finding Code Constructs
This library provides several powerful methods to search for specific code constructs:
§1. Search by Node Type
use tree_parser::{parse_file, search_by_node_type, Language};
#[tokio::main]
async fn main() -> Result<(), Box<dyn std::error::Error>> {
let parsed_file = parse_file("example.py", Language::Python).await?;
// Find all function definitions
let functions = search_by_node_type(&parsed_file, "function_definition", None);
// Find test functions using regex
let test_functions = search_by_node_type(&parsed_file, "function_definition", Some(r"^test_.*"));
println!("Found {} functions, {} are tests", functions.len(), test_functions.len());
Ok(())
}§2. Search by Multiple Node Types
use tree_parser::{parse_file, search_by_multiple_node_types, Language};
#[tokio::main]
async fn main() -> Result<(), Box<dyn std::error::Error>> {
let parsed_file = parse_file("example.js", Language::JavaScript).await?;
// Find all function-like constructs
let functions = search_by_multiple_node_types(
&parsed_file,
&["function_declaration", "function_expression", "arrow_function"],
None
);
println!("Found {} function-like constructs", functions.len());
Ok(())
}§3. Advanced Search with Tree-sitter Queries
use tree_parser::{parse_file, search_by_query, Language};
#[tokio::main]
async fn main() -> Result<(), Box<dyn std::error::Error>> {
let parsed_file = parse_file("example.py", Language::Python).await?;
// Find all class definitions with their methods
let query = r#"
(class_definition
name: (identifier) @class_name
body: (block
(function_definition
name: (identifier) @method_name)))
"#;
let classes_with_methods = search_by_query(&parsed_file, query)?;
println!("Found {} classes with methods", classes_with_methods.len());
Ok(())
}§Finding Node Types
To effectively search for code constructs, you need to know the correct node types. Here are the most common node types by language:
§Python
function_definition- Function definitionsclass_definition- Class definitionsimport_statement- Import statementsdecorated_definition- Functions/classes with decoratorsassignment- Variable assignments
§Rust
function_item- Function definitionsstruct_item- Struct definitionsimpl_item- Implementation blockstrait_item- Trait definitionsenum_item- Enum definitionsmod_item- Module definitions
§JavaScript/TypeScript
function_declaration- Function declarationsfunction_expression- Function expressionsarrow_function- Arrow functionsmethod_definition- Class methodsclass_declaration- Class declarations
§Java
method_declaration- Method definitionsclass_declaration- Class declarationsinterface_declaration- Interface declarationsconstructor_declaration- Constructor definitions
For a complete list of node types, inspect your parsed files or consult the tree-sitter grammar documentation for your target language.
§Discovering Node Types
use tree_parser::{parse_file, Language};
use std::collections::HashSet;
#[tokio::main]
async fn main() -> Result<(), Box<dyn std::error::Error>> {
let parsed_file = parse_file("your_file.py", Language::Python).await?;
// Collect all unique node types
let mut node_types: HashSet<String> = HashSet::new();
for construct in &parsed_file.constructs {
node_types.insert(construct.node_type.clone());
}
println!("Available node types:");
for node_type in &node_types {
println!(" - {}", node_type);
}
Ok(())
}§4. Online Tree-sitter Playground
Use the Tree-sitter Playground to:
- Paste your code
- Select the appropriate language
- Explore the generated syntax tree
- Identify the exact node types you need
§Best Practices
§Performance Optimization
- Increase
max_concurrent_filesfor better performance on multi-core systems - Use file filters to exclude unnecessary files (node_modules, target, .git, etc.)
- Set appropriate
max_file_size_mblimits to skip very large files - Enable caching with
enable_caching: truefor repeated operations - Use
LanguageDetection::ByExtensionfor faster processing
§Memory Management
- Set
syntax_tree: Noneafter extracting constructs if you don’t need the tree - Process files in batches rather than loading entire projects
- Use streaming approaches for very large codebases
§Error Handling
- Always check
project.error_filesfor individual file parsing errors - Handle different
ErrorTypevariants appropriately - Use proper error propagation with
?operator
§Troubleshooting
Common Issues:
- “Unsupported language” error: Enable correct feature flags in Cargo.toml
- “Parse error” for valid code: Check for syntax errors or unsupported language features
- Poor performance: Increase concurrency, use filters, enable caching
- Memory issues: Drop syntax trees after use, process in batches
- Missing constructs: Verify node type names, check nesting, use tree-sitter queries
Structs§
- Code
Construct - Represents a parsed code construct (function, class, struct, etc.)
- Construct
Metadata - Metadata associated with a code construct
- File
Error - Represents an error that occurred while processing a specific file
- File
Filter - Filter criteria for selecting which files to parse
- Parameter
- Represents a function or method parameter
- Parse
Options - Configuration options for parsing operations
- Parsed
File - Represents a successfully parsed source code file
- Parsed
Project - Represents the results of parsing an entire project or directory
- Point
- A position in a multi-line text document, in terms of rows and columns.
- Range
- A range of positions in a multi-line text document, both in terms of bytes and of rows and columns.
Enums§
- Error
- Main error type for the tree parser library
- Error
Type - Categorizes different types of errors for easier handling
- Language
- Supported programming languages
- Language
Detection - Methods for detecting the programming language of a file
Functions§
- detect_
language - Combined language detection using multiple methods
- detect_
language_ by_ content - Detect language by file content patterns
- detect_
language_ by_ extension - Detect language by file extension
- detect_
language_ by_ shebang - Detect language by shebang line
- format_
duration - Format duration in human-readable format
- format_
file_ size - Format file size in human-readable format
- get_
file_ extension - Extract the file extension from a file path
- get_
file_ name_ without_ extension - Extract the file name without its extension
- get_
supported_ extensions - Get a list of all file extensions supported by the parser
- get_
supported_ node_ types - Get supported node types for a language
- get_
tree_ sitter_ language - Get the tree-sitter language for a given Language enum
- is_
supported_ extension - Check if a file extension is supported by the parser
- is_
valid_ directory_ path - Validate that a directory path exists
- is_
valid_ file_ path - Validate that a file path exists
- language_
from_ string - Convert a string representation to a Language enum
- language_
to_ string - Convert a Language enum to its string representation
- matches_
ignore_ patterns - Check if a path matches any of the specified ignore patterns
- parse_
directory - Parse an entire project directory recursively
- parse_
directory_ with_ filter - Parse a project directory with custom file filtering
- parse_
file - Parse a single source code file and extract code constructs
- sanitize_
path - Sanitize a file path for safe usage
- search_
by_ multiple_ node_ types - Search for code constructs matching any of the specified node types
- search_
by_ node_ type - Search for code constructs by their tree-sitter node type
- search_
by_ query - Execute a custom tree-sitter query for advanced searching