Expand description
§Tree Parser Library
A comprehensive Rust library for parsing and searching code elements across multiple programming languages using tree-sitter. This library provides powerful tools for static code analysis, code search, and AST manipulation.
§Features
- Multi-language Support: Parse Python, Rust, JavaScript, TypeScript, Java, C, C++, Go, and more
- High Performance: Concurrent parsing with async/await for maximum efficiency
- Advanced Search: Find functions, classes, structs, interfaces with regex pattern matching
- Flexible Filtering: Custom file filters and parsing options
- Rich Metadata: Extract detailed information about code constructs
- Type Safety: Full Rust type safety with comprehensive error handling
- Configurable: Extensive configuration options for different use cases
§Quick Start
use tree_parser::{parse_file, Language};
#[tokio::main]
async fn main() -> Result<(), Box<dyn std::error::Error>> {
// Parse a single file
let parsed_file = parse_file("src/main.rs", Language::Rust).await?;
println!("Found {} constructs", parsed_file.constructs.len());
for construct in &parsed_file.constructs {
if let Some(name) = &construct.name {
println!("{}: {} (lines {}-{})",
construct.node_type, name,
construct.start_line, construct.end_line);
}
}
Ok(())
}
§Finding Code Constructs
This library provides several powerful methods to search for specific code constructs:
§1. Search by Node Type
use tree_parser::{parse_file, search_by_node_type, Language};
#[tokio::main]
async fn main() -> Result<(), Box<dyn std::error::Error>> {
let parsed_file = parse_file("example.py", Language::Python).await?;
// Find all function definitions
let functions = search_by_node_type(&parsed_file, "function_definition", None);
// Find test functions using regex
let test_functions = search_by_node_type(&parsed_file, "function_definition", Some(r"^test_.*"));
println!("Found {} functions, {} are tests", functions.len(), test_functions.len());
Ok(())
}
§2. Search by Multiple Node Types
use tree_parser::{parse_file, search_by_multiple_node_types, Language};
#[tokio::main]
async fn main() -> Result<(), Box<dyn std::error::Error>> {
let parsed_file = parse_file("example.js", Language::JavaScript).await?;
// Find all function-like constructs
let functions = search_by_multiple_node_types(
&parsed_file,
&["function_declaration", "function_expression", "arrow_function"],
None
);
println!("Found {} function-like constructs", functions.len());
Ok(())
}
§3. Advanced Search with Tree-sitter Queries
use tree_parser::{parse_file, search_by_query, Language};
#[tokio::main]
async fn main() -> Result<(), Box<dyn std::error::Error>> {
let parsed_file = parse_file("example.py", Language::Python).await?;
// Find all class definitions with their methods
let query = r#"
(class_definition
name: (identifier) @class_name
body: (block
(function_definition
name: (identifier) @method_name)))
"#;
let classes_with_methods = search_by_query(&parsed_file, query)?;
println!("Found {} classes with methods", classes_with_methods.len());
Ok(())
}
§Finding Node Types
To effectively search for code constructs, you need to know the correct node types. Here are the most common node types by language:
§Python
function_definition
- Function definitionsclass_definition
- Class definitionsimport_statement
- Import statementsdecorated_definition
- Functions/classes with decoratorsassignment
- Variable assignments
§Rust
function_item
- Function definitionsstruct_item
- Struct definitionsimpl_item
- Implementation blockstrait_item
- Trait definitionsenum_item
- Enum definitionsmod_item
- Module definitions
§JavaScript/TypeScript
function_declaration
- Function declarationsfunction_expression
- Function expressionsarrow_function
- Arrow functionsmethod_definition
- Class methodsclass_declaration
- Class declarations
§Java
method_declaration
- Method definitionsclass_declaration
- Class declarationsinterface_declaration
- Interface declarationsconstructor_declaration
- Constructor definitions
For a complete list of node types, inspect your parsed files or consult the tree-sitter grammar documentation for your target language.
§Discovering Node Types
use tree_parser::{parse_file, Language};
use std::collections::HashSet;
#[tokio::main]
async fn main() -> Result<(), Box<dyn std::error::Error>> {
let parsed_file = parse_file("your_file.py", Language::Python).await?;
// Collect all unique node types
let mut node_types: HashSet<String> = HashSet::new();
for construct in &parsed_file.constructs {
node_types.insert(construct.node_type.clone());
}
println!("Available node types:");
for node_type in &node_types {
println!(" - {}", node_type);
}
Ok(())
}
§4. Online Tree-sitter Playground
Use the Tree-sitter Playground to:
- Paste your code
- Select the appropriate language
- Explore the generated syntax tree
- Identify the exact node types you need
§Best Practices
§Performance Optimization
- Increase
max_concurrent_files
for better performance on multi-core systems - Use file filters to exclude unnecessary files (node_modules, target, .git, etc.)
- Set appropriate
max_file_size_mb
limits to skip very large files - Enable caching with
enable_caching: true
for repeated operations - Use
LanguageDetection::ByExtension
for faster processing
§Memory Management
- Set
syntax_tree: None
after extracting constructs if you don’t need the tree - Process files in batches rather than loading entire projects
- Use streaming approaches for very large codebases
§Error Handling
- Always check
project.error_files
for individual file parsing errors - Handle different
ErrorType
variants appropriately - Use proper error propagation with
?
operator
§Troubleshooting
Common Issues:
- “Unsupported language” error: Enable correct feature flags in Cargo.toml
- “Parse error” for valid code: Check for syntax errors or unsupported language features
- Poor performance: Increase concurrency, use filters, enable caching
- Memory issues: Drop syntax trees after use, process in batches
- Missing constructs: Verify node type names, check nesting, use tree-sitter queries
Structs§
- Code
Construct - Represents a parsed code construct (function, class, struct, etc.)
- Construct
Metadata - Metadata associated with a code construct
- File
Error - Represents an error that occurred while processing a specific file
- File
Filter - Filter criteria for selecting which files to parse
- Parameter
- Represents a function or method parameter
- Parse
Options - Configuration options for parsing operations
- Parsed
File - Represents a successfully parsed source code file
- Parsed
Project - Represents the results of parsing an entire project or directory
- Point
- A position in a multi-line text document, in terms of rows and columns.
- Range
- A range of positions in a multi-line text document, both in terms of bytes and of rows and columns.
Enums§
- Error
- Main error type for the tree parser library
- Error
Type - Categorizes different types of errors for easier handling
- Language
- Supported programming languages
- Language
Detection - Methods for detecting the programming language of a file
Functions§
- detect_
language - Combined language detection using multiple methods
- detect_
language_ by_ content - Detect language by file content patterns
- detect_
language_ by_ extension - Detect language by file extension
- detect_
language_ by_ shebang - Detect language by shebang line
- format_
duration - Format duration in human-readable format
- format_
file_ size - Format file size in human-readable format
- get_
file_ extension - Extract the file extension from a file path
- get_
file_ name_ without_ extension - Extract the file name without its extension
- get_
supported_ extensions - Get a list of all file extensions supported by the parser
- get_
supported_ node_ types - Get supported node types for a language
- get_
tree_ sitter_ language - Get the tree-sitter language for a given Language enum
- is_
supported_ extension - Check if a file extension is supported by the parser
- is_
valid_ directory_ path - Validate that a directory path exists
- is_
valid_ file_ path - Validate that a file path exists
- language_
from_ string - Convert a string representation to a Language enum
- language_
to_ string - Convert a Language enum to its string representation
- matches_
ignore_ patterns - Check if a path matches any of the specified ignore patterns
- parse_
directory - Parse an entire project directory recursively
- parse_
directory_ with_ filter - Parse a project directory with custom file filtering
- parse_
file - Parse a single source code file and extract code constructs
- sanitize_
path - Sanitize a file path for safe usage
- search_
by_ multiple_ node_ types - Search for code constructs matching any of the specified node types
- search_
by_ node_ type - Search for code constructs by their tree-sitter node type
- search_
by_ query - Execute a custom tree-sitter query for advanced searching