normalize-languages 0.3.2

Tree-sitter language support and dynamic grammar loading
Documentation
1
2
3
4
5
# normalize-languages/src/queries

Tree-sitter `.scm` query files for symbol extraction, call graph, complexity, imports, and type analysis.

395 files covering 84 languages (Jinja2 has tags + complexity; JSON, TOML, YAML, CSS, SCSS, HTML, XML have tags for data/markup symbol extraction; Thrift and Dockerfile have tags for IDL/container symbol extraction). Query types per language: `<lang>.tags.scm` (symbol definitions — functions, classes, types), `<lang>.calls.scm` (function call sites), `<lang>.complexity.scm` (cyclomatic complexity nodes), `<lang>.imports.scm` (import/require statements), `<lang>.types.scm` (type definitions), `<lang>.decorations.scm` (doc comments, attributes, annotations that precede a symbol and should move with it), `<lang>.test_regions.scm` (byte ranges of test-only code captured as `@test_region`, used by the syntax-rules runner to skip findings for rules where `applies_in_tests = false`; Rust currently the only language with one, capturing inline `#[cfg(test)] mod ...` blocks), `<lang>.cfg.scm` (control flow graph nodes — if/for/while/switch/try/break/continue/return/throw — used by `normalize-cfg`; currently: rust, python, go, typescript, tsx, javascript, java). Not every language has all query types — coverage varies by what the grammar models. These files drive the index extraction in `normalize-facts`; node classification stays in `.scm` while name/field extraction from matched nodes stays in Rust. All `.tags.scm` and `.types.scm` files are registered in `grammar_loader.rs` via `bundled_tags_query` / `bundled_types_query`. `rust.calls.scm` uses `@call.write` captures alongside `@call` to tag write-context calls (calls on the RHS of `assignment_expression` or `compound_assignment_expr`); `collect_calls_with_query` in `normalize-facts/src/symbols.rs` maps `@call.write` to `access = "write"`. `*.imports.scm` files support `@import.reexport` capture to mark re-export statements (`pub use` in Rust, `export { } from` / `export * from` in TypeScript/JavaScript); the `is_reexport` flag is stored in the `imports` table and used by `FileIndex::trace_reexports()` to follow re-export chains to ultimate source files. `<lang>.cfg.scm` files now cover 76 languages including all C-family (C, C++, ObjC, C#, Kotlin, Swift, Dart), JVM/functional (Scala, Groovy, VB, Haskell, OCaml, F#, Elixir, Erlang, Clojure, Gleam, ReScript, Idris, Agda, Lean, CommonLisp, Scheme, Elisp), scripting (Ruby, Lua, PHP, Perl, Bash, Fish, Awk, Zsh, PowerShell, Batch, Vim), systems (Zig, Ada, D, Prolog, R, Julia, MATLAB, GLSL, HLSL, Verilog, VHDL), and domain/config (Nix, HCL, Starlark, Elm, Jinja2, Svelte, Vue, CMake, Meson, TLA+, jq). Remaining DEFERRED: asm, x86asm (assembly — grammar not inspectable), uiua (array language, no standard query files).