thread-language
Language definitions and tree-sitter parsers for the Thread AST analysis toolkit.
Overview
thread-language provides unified language support for AST-based code analysis and transformation. Built on tree-sitter grammars, it implements consistent Rust traits (Language and LanguageExt) across 24+ programming languages.
This crate is a fork of ast-grep-language, enhanced with improved performance, better feature organization, and streamlined language detection.
Supported Languages
The crate supports two categories of languages:
Languages with Custom Pattern Processing
These languages require special handling for metavariables because they don't accept $ as a valid identifier character:
- C/C++ - Uses
µas expando character - C# - Uses
µas expando character - CSS - Uses
_as expando character - Elixir - Uses
µas expando character - Go - Uses
µas expando character - Haskell - Uses
µas expando character - HTML - Uses
zas expando character with injection support - Kotlin - Uses
µas expando character - PHP - Uses
µas expando character - Python - Uses
µas expando character - Ruby - Uses
µas expando character - Rust - Uses
µas expando character - Swift - Uses
µas expando character
Standard Languages
These languages accept $ in identifiers and use standard pattern processing:
- Bash
- Java
- JavaScript
- JSON
- Lua
- Scala
- TypeScript
- TSX
- YAML
Features
Core Features
- Unified API - All languages implement the same
LanguageandLanguageExttraits - Pattern Matching - Advanced AST pattern matching with metavariable support.
- Requires
matchingfeature flag (enabled by default).
- Requires
- Language Detection - Automatic language detection from file extensions
- Fast Parser Access - Cached tree-sitter parsers for zero-cost repeated access
- Injection Support - Extract embedded languages (JavaScript in HTML, CSS in HTML)
- Requires
html-embeddedflag.
- Requires
Feature Flags
matching- Enables advanced AST pattern matching and replacement with metavariable support.
Parser Groups
all-parsers(default) - Includes all language parsersnapi-environment- Includes only NAPI-compatible (WASM for Node.js environments) parsers (CSS, HTML, JavaScript, TypeScript)
Individual Languages
Each language can be enabled individually:
[]
= { = "0.1", = false, = ["rust", "javascript"] }
Available language features:
bash,c,cpp,csharp,css,elixir,go,haskell,html,html-embeddedjava,javascript,json,kotlin,lua,php,pythonruby,rust,scala,swift,typescript,tsx,yaml
Usage
Basic Language Detection
use SupportLang;
use Path;
// Detect language from file extension
let lang = from_path.unwrap;
assert_eq!;
// Parse from string
let lang: SupportLang = "javascript".parse.unwrap;
Pattern Matching
use ;
use ;
// Using specific language type
let rust = Rust;
let source = "fn main() { println!('Hello'); }";
let tree = rust.ast_grep;
// Using enum for runtime language selection
let lang = Rust;
let tree = lang.ast_grep;
Working with Metavariables
For languages that don't support $ in identifiers, the crate automatically handles pattern preprocessing:
use Python;
let python = Python;
// Pattern uses $ for metavariables
let pattern = "def $FUNC($ARGS): $BODY";
// Automatically converted to use µ internally
let processed = python.pre_process_pattern;
HTML with Embedded Languages
use Html;
let html = Html;
let source = r#"
<script>console.log('hello');</script>
<style>.class { color: red; }</style>
"#;
let tree = html.ast_grep;
let injections = html.extract_injections;
// injections contains JavaScript and CSS code ranges
Architecture
Core Modules
lib.rs- Main module with language definitions andSupportLangenumparsers.rs- Tree-sitter parser initialization and cachinghtml.rs- Special HTML implementation with language injection support
Language Implementation Patterns
The crate uses two macros to implement languages:
impl_lang!- For standard languages that accept$in identifiersimpl_lang_expando!- For languages requiring custom expando characters
Both macros generate the same Language and LanguageExt trait implementations but with different pattern preprocessing behavior.
Performance
- Cached Parsers - Tree-sitter languages are initialized once and cached using
OnceLock - Fast Path Optimizations - Common file extensions and language names use fast-path matching
- Zero-Cost Abstractions - Language traits compile to direct function calls
Examples
File Type Detection
use SupportLang;
// Get file types for a language
let types = Rust.file_types;
// Use with ignore crate for file filtering
Pattern Building
use JavaScript;
use ;
let js = JavaScript;
let builder = new;
let pattern = js.build_pattern.unwrap;
Contributing
When adding a new language:
- Add the tree-sitter dependency to
Cargo.toml - Add the parser function to
parsers.rs - Choose the appropriate macro (
impl_lang!orimpl_lang_expando!) inlib.rs - Add the language to
SupportLangenum and related functions - Add tests in a separate module file
License
Licensed under AGPL-3.0-or-later AND MIT. See license files for details.