normalize-languages 0.3.1

Tree-sitter language support and dynamic grammar loading
Documentation
1
2
3
4
5
6
7
# normalize-languages

Tree-sitter language support for ~98 programming languages.

Each language is a zero-sized struct (e.g., `Python`, `Rust`, `Go`, `TypeScript`) implementing the `Language` trait, which provides symbol extraction, import parsing, visibility detection, docstring extraction, test file globs, and embedded content support. Grammars are loaded dynamically from compiled shared libraries via `GrammarLoader` (backed by libloading), with query files (`.scm`) loaded from `src/queries/`. The crate also provides `support_for_path`, `support_for_extension`, `support_for_grammar`, and `supported_languages` registry functions. All 98 languages are individually feature-gated under `lang-*` flags grouped into `langs-core`, `langs-functional`, `langs-config`, `langs-data`, `langs-markup`, `langs-hardware`, and `langs-misc`.

390 `.scm` query files across 6 types (tags, calls, complexity, imports, types, decorations) cover 69+ languages. `<lang>.decorations.scm` files (45 languages) define `@decoration` captures for doc comments, attributes, decorators, and annotations that immediately precede a symbol — used by `normalize edit move` to include decorations when relocating symbols. Data/markup format symbol extraction for JSON, TOML, YAML, CSS, HTML, XML via tags.scm + Language trait methods (refine_kind, node_name, container_body, build_signature). TOML inline table pairs filtered; CSS selectors/at-rules/declarations extracted; HTML/XML element trees with attribute signatures. Import extraction added for AWK (`@include`/`@load` via `directive` node), HTML (`<script src>`, `<link href>`), and Jinja2 (`extends`/`import`/`from`/`include`) in 2026-03-11. All `*.tags.scm` and `*.types.scm` files are fully registered in `grammar_loader.rs`. Integration fixture tests for query correctness live in `tests/query_fixtures.rs` (skip gracefully when `target/grammars/` is absent — run `cargo xtask build-grammars` first). `groovy.imports.scm` was fixed to use `import: (qualified_name) @import.path` field accessor (previously used the invalid `dotted_identifier` node kind which caused the query to fail compilation); live test `groovy_imports_live` added. `GrammarLoadError` is now a proper typed error enum (replacing `anyhow::Error`) exposed in the public API; grammar validation utilities (`validate_unused_kinds_audit`, `cross_check_node_kinds`) added in `registry.rs` with `.ok().flatten()` patterns corrected to `.map_err()` since `GrammarLoader::get` returns `Result<Language, GrammarLoadError>` not `Result<Option<Language>, _>`.