Crate tree_parser

Source
Expand description

§Tree Parser Library

A comprehensive Rust library for parsing and searching code elements across multiple programming languages using tree-sitter. This library provides powerful tools for static code analysis, code search, and AST manipulation.

§Features

  • Multi-language Support: Parse Python, Rust, JavaScript, TypeScript, Java, C, C++, Go, and more
  • High Performance: Concurrent parsing with async/await for maximum efficiency
  • Advanced Search: Find functions, classes, structs, interfaces with regex pattern matching
  • Flexible Filtering: Custom file filters and parsing options
  • Rich Metadata: Extract detailed information about code constructs
  • Type Safety: Full Rust type safety with comprehensive error handling
  • Configurable: Extensive configuration options for different use cases

§Quick Start

use tree_parser::{parse_file, Language};

#[tokio::main]
async fn main() -> Result<(), Box<dyn std::error::Error>> {
    // Parse a single file
    let parsed_file = parse_file("src/main.rs", Language::Rust).await?;
     
    println!("Found {} constructs", parsed_file.constructs.len());
    for construct in &parsed_file.constructs {
        if let Some(name) = &construct.name {
            println!("{}: {} (lines {}-{})", 
                construct.node_type, name, 
                construct.start_line, construct.end_line);
        }
    }
     
    Ok(())
}

§Finding Code Constructs

This library provides several powerful methods to search for specific code constructs:

§1. Search by Node Type

use tree_parser::{parse_file, search_by_node_type, Language};

#[tokio::main]
async fn main() -> Result<(), Box<dyn std::error::Error>> {
    let parsed_file = parse_file("example.py", Language::Python).await?;
     
    // Find all function definitions
    let functions = search_by_node_type(&parsed_file, "function_definition", None);
     
    // Find test functions using regex
    let test_functions = search_by_node_type(&parsed_file, "function_definition", Some(r"^test_.*"));
     
    println!("Found {} functions, {} are tests", functions.len(), test_functions.len());
    Ok(())
}

§2. Search by Multiple Node Types

use tree_parser::{parse_file, search_by_multiple_node_types, Language};

#[tokio::main]
async fn main() -> Result<(), Box<dyn std::error::Error>> {
    let parsed_file = parse_file("example.js", Language::JavaScript).await?;
     
    // Find all function-like constructs
    let functions = search_by_multiple_node_types(
        &parsed_file,
        &["function_declaration", "function_expression", "arrow_function"],
        None
    );
     
    println!("Found {} function-like constructs", functions.len());
    Ok(())
}

§3. Advanced Search with Tree-sitter Queries

use tree_parser::{parse_file, search_by_query, Language};

#[tokio::main]
async fn main() -> Result<(), Box<dyn std::error::Error>> {
    let parsed_file = parse_file("example.py", Language::Python).await?;
     
    // Find all class definitions with their methods
    let query = r#"
        (class_definition
          name: (identifier) @class_name
          body: (block
            (function_definition
              name: (identifier) @method_name)))
    "#;
     
    let classes_with_methods = search_by_query(&parsed_file, query)?;
    println!("Found {} classes with methods", classes_with_methods.len());
    Ok(())
}

§Finding Node Types

To effectively search for code constructs, you need to know the correct node types. Here are the most common node types by language:

§Python

  • function_definition - Function definitions
  • class_definition - Class definitions
  • import_statement - Import statements
  • decorated_definition - Functions/classes with decorators
  • assignment - Variable assignments

§Rust

  • function_item - Function definitions
  • struct_item - Struct definitions
  • impl_item - Implementation blocks
  • trait_item - Trait definitions
  • enum_item - Enum definitions
  • mod_item - Module definitions

§JavaScript/TypeScript

  • function_declaration - Function declarations
  • function_expression - Function expressions
  • arrow_function - Arrow functions
  • method_definition - Class methods
  • class_declaration - Class declarations

§Java

  • method_declaration - Method definitions
  • class_declaration - Class declarations
  • interface_declaration - Interface declarations
  • constructor_declaration - Constructor definitions

For a complete list of node types, inspect your parsed files or consult the tree-sitter grammar documentation for your target language.

§Discovering Node Types

use tree_parser::{parse_file, Language};
use std::collections::HashSet;

#[tokio::main]
async fn main() -> Result<(), Box<dyn std::error::Error>> {
    let parsed_file = parse_file("your_file.py", Language::Python).await?;
     
    // Collect all unique node types
    let mut node_types: HashSet<String> = HashSet::new();
    for construct in &parsed_file.constructs {
        node_types.insert(construct.node_type.clone());
    }
     
    println!("Available node types:");
    for node_type in &node_types {
        println!("  - {}", node_type);
    }
     
    Ok(())
}

§4. Online Tree-sitter Playground

Use the Tree-sitter Playground to:

  1. Paste your code
  2. Select the appropriate language
  3. Explore the generated syntax tree
  4. Identify the exact node types you need

§Best Practices

§Performance Optimization

  • Increase max_concurrent_files for better performance on multi-core systems
  • Use file filters to exclude unnecessary files (node_modules, target, .git, etc.)
  • Set appropriate max_file_size_mb limits to skip very large files
  • Enable caching with enable_caching: true for repeated operations
  • Use LanguageDetection::ByExtension for faster processing

§Memory Management

  • Set syntax_tree: None after extracting constructs if you don’t need the tree
  • Process files in batches rather than loading entire projects
  • Use streaming approaches for very large codebases

§Error Handling

  • Always check project.error_files for individual file parsing errors
  • Handle different ErrorType variants appropriately
  • Use proper error propagation with ? operator

§Troubleshooting

Common Issues:

  • “Unsupported language” error: Enable correct feature flags in Cargo.toml
  • “Parse error” for valid code: Check for syntax errors or unsupported language features
  • Poor performance: Increase concurrency, use filters, enable caching
  • Memory issues: Drop syntax trees after use, process in batches
  • Missing constructs: Verify node type names, check nesting, use tree-sitter queries

Structs§

CodeConstruct
Represents a parsed code construct (function, class, struct, etc.)
ConstructMetadata
Metadata associated with a code construct
FileError
Represents an error that occurred while processing a specific file
FileFilter
Filter criteria for selecting which files to parse
Parameter
Represents a function or method parameter
ParseOptions
Configuration options for parsing operations
ParsedFile
Represents a successfully parsed source code file
ParsedProject
Represents the results of parsing an entire project or directory
Point
A position in a multi-line text document, in terms of rows and columns.
Range
A range of positions in a multi-line text document, both in terms of bytes and of rows and columns.

Enums§

Error
Main error type for the tree parser library
ErrorType
Categorizes different types of errors for easier handling
Language
Supported programming languages
LanguageDetection
Methods for detecting the programming language of a file

Functions§

detect_language
Combined language detection using multiple methods
detect_language_by_content
Detect language by file content patterns
detect_language_by_extension
Detect language by file extension
detect_language_by_shebang
Detect language by shebang line
format_duration
Format duration in human-readable format
format_file_size
Format file size in human-readable format
get_file_extension
Extract the file extension from a file path
get_file_name_without_extension
Extract the file name without its extension
get_supported_extensions
Get a list of all file extensions supported by the parser
get_supported_node_types
Get supported node types for a language
get_tree_sitter_language
Get the tree-sitter language for a given Language enum
is_supported_extension
Check if a file extension is supported by the parser
is_valid_directory_path
Validate that a directory path exists
is_valid_file_path
Validate that a file path exists
language_from_string
Convert a string representation to a Language enum
language_to_string
Convert a Language enum to its string representation
matches_ignore_patterns
Check if a path matches any of the specified ignore patterns
parse_directory
Parse an entire project directory recursively
parse_directory_with_filter
Parse a project directory with custom file filtering
parse_file
Parse a single source code file and extract code constructs
sanitize_path
Sanitize a file path for safe usage
search_by_multiple_node_types
Search for code constructs matching any of the specified node types
search_by_node_type
Search for code constructs by their tree-sitter node type
search_by_query
Execute a custom tree-sitter query for advanced searching