Auto LSP Codegen
To generate an AST, simply provide a Tree-sitter node-types.json and LanguageFn of any language to the generate function of the auto_lsp_codegen crate.
[!NOTE] Although
auto_lsp_codegenis a standalone crate, the generated code depends on the mainauto_lspcrate.
Usage
The auto_lsp_codegen crate exposes a single generate function, which takes:
- A
node-types.json, - A
LanguageFn - A
HashMap<&str, &str>to rename tokens (see Custom Tokens) - And returns a TokenStream.
How you choose to use the TokenStream is up to you.
The most common setup is to call it from a build.rs script and write the generated code to a Rust file.
Note, however, that the output can be quite large—for example, Python’s AST results in ~11,000 lines of code.
use generate;
use ;
You can also invoke it from your own CLI or tool if needed.
How Codegen Works
The generated code structure depends on the Tree-sitter grammar.
Structs for Rules
Each rule in node-types.json becomes a dedicated Rust struct. For example, given the rule:
function_definition: $ =>
The generated struct would look like this:
Field Matching
To match fields, codegen uses the field_id() method from the Tree-sitter cursor.
From the above example, the generated builder might look like this:
builder.builder;
Each u16 represents the unique field ID assigned by the Tree-sitter language parser.
Handling Children
If a node has no named fields, a children enum is generated to represent all possible variants.
- If the children are unnamed, a generic "Operator_" enum is generated
- If the children are named, the enum will be a concatenation of all possible child node types with underscores, using sanitized Rust-friendly names.
For example, given the rule:
_statement: $ =>
The generated enum would look like this:
[!NOTE] If the generated enum name becomes too long, consider using a Tree-sitter supertype to group nodes together.
The kind_id() method is used to determine child kinds during traversal.
The AstNode::contains method relies on this to check whether a node kind belongs to a specific struct or enum variant.
Vec and Option Fields
repeat and repeat1 in the grammar will generate a Vec field.
optional(...) will generate an Option<T> field.
Token Naming
Unnamed tokens are mapped to Rust enums using a built-in token map. For instance:
,
,
,
,
,
Generates:
Tokens with regular identifiers are converted to PascalCase.
Custom Tokens
If your grammar defines additional unnamed tokens not covered by the default map, you can provide a custom token mapping to generate appropriate Rust enum names.
use generate;
let _result = generate;
Tokens that are not in the map will be added, and tokens that already exist in the map will be overwritten.
Super Types
Tree-sitter supports supertypes, which allow grouping related nodes under a common type.
For example, in the Python grammar:
,
This becomes a Rust enum:
[!NOTE] Some super types might contain other super types, in which case, the generated enum will flatten the hierarchy.