# normalize-languages
Tree-sitter language support for ~98 programming languages, plus per-language `ModuleResolver` implementations for cross-file import resolution.
Each language is a zero-sized struct (e.g., `Python`, `Rust`, `Go`, `TypeScript`) implementing the `Language` trait, which provides symbol extraction, import parsing, visibility detection, docstring extraction, test file globs, embedded content support, and (for languages with module systems) `module_resolver()` returning a `&dyn ModuleResolver`. Grammars are loaded dynamically from compiled shared libraries via `GrammarLoader` (backed by libloading), with query files (`.scm`) loaded from `src/queries/`. The crate also provides `support_for_path`, `support_for_extension`, `support_for_grammar`, and `supported_languages` registry functions. All 98 languages are individually feature-gated under `lang-*` flags grouped into `langs-core`, `langs-functional`, `langs-config`, `langs-data`, `langs-markup`, `langs-hardware`, and `langs-misc`.
Phase 0 `ModuleResolver` implementations (26 languages): Rust (Cargo workspaces), TypeScript/TSX (tsconfig.json paths, .js→.ts elision), JavaScript (jsconfig.json, ESM/CJS), Python (relative imports, src/ layout), Go (go.mod), Ruby (require_relative), Java/Kotlin/Groovy/Scala (Maven/Gradle src/main layout), C#/VB/F# (.NET namespace→file), Swift (SPM Sources targets), Dart (pubspec.yaml package: imports), Zig (@import), Elixir (Mix CamelCase↔snake_case), Erlang (1:1 module=file), Haskell (Cabal hs-source-dirs), OCaml (capitalized stem), Lua (require dot-path), PHP (composer.json PSR-4), Perl (lib/ :: path), Clojure (src/ dot-namespace), Common Lisp (workspace stem), Scheme (R7RS .sld/.scm), Gleam (gleam.toml src/), ReScript (bsconfig.json). Deferred (require design work or toolchain info): C/C++/ObjC, Elm, D, R, Julia, MATLAB, Nix, Ada, Agda, Idris, Lean, Prolog.
`.cfg.scm` query files (76 languages) define CFG capture vocabularies — `@cfg.branch`, `@cfg.loop`, `@cfg.match`, `@cfg.try`, `@cfg.exit.*`, `@cfg.def`, `@cfg.use`, `@cfg.effect.*` — for the `normalize-cfg` builder. Phase 4 adds `@cfg.exit.throw.type` (thrown exception type) and `@cfg.try.catch.type` (caught type) to Java, Python, JS/TS/TSX, C++, C#.
390 `.scm` query files across 6 types (tags, calls, complexity, imports, types, decorations) cover 69+ languages. `<lang>.decorations.scm` files (45 languages) define `@decoration` captures for doc comments, attributes, decorators, and annotations that immediately precede a symbol — used by `normalize edit move` to include decorations when relocating symbols. Data/markup format symbol extraction for JSON, TOML, YAML, CSS, HTML, XML via tags.scm + Language trait methods (refine_kind, node_name, container_body, build_signature). TOML inline table pairs filtered; CSS selectors/at-rules/declarations extracted; HTML/XML element trees with attribute signatures. Import extraction added for AWK (`@include`/`@load` via `directive` node), HTML (`<script src>`, `<link href>`), and Jinja2 (`extends`/`import`/`from`/`include`) in 2026-03-11. All `*.tags.scm` and `*.types.scm` files are fully registered in `grammar_loader.rs`. Integration fixture tests for query correctness live in `tests/query_fixtures.rs` (skip gracefully when `target/grammars/` is absent — run `cargo xtask build-grammars` first). `groovy.imports.scm` was fixed to use `import: (qualified_name) @import.path` field accessor (previously used the invalid `dotted_identifier` node kind which caused the query to fail compilation); live test `groovy_imports_live` added. `GrammarLoadError` is now a proper typed error enum (replacing `anyhow::Error`) exposed in the public API; grammar validation utilities (`validate_unused_kinds_audit`, `cross_check_node_kinds`) added in `registry.rs` with `.ok().flatten()` patterns corrected to `.map_err()` since `GrammarLoader::get` returns `Result<Language, GrammarLoadError>` not `Result<Option<Language>, _>`.