# normalize-facts/src
Source files for fact extraction and storage.
- `lib.rs` — public API re-exports
- `ca_cache.rs` — `CaCache`: content-addressed extraction cache (inlined from former `normalize-ca-cache` crate); SQLite-backed, keyed by `(blake3(file_bytes), extractor_version, grammar)`; stored at `~/.config/normalize/ca-cache.sqlite`; `symbol_cache()` global singleton provides a shared `CaCache` instance for `Extractor` (symbol extraction cache, keyed by `"symbols-v1-{all|public}"`); `gc_stale_versions` now skips `"symbols-*"` entries; `gc_stale_symbol_versions` GCs outdated symbol cache entries; `busy_timeout=5000ms` prevents contention when daemon and CLI share the DB
- `extract.rs` — `Extractor`, `ExtractOptions`, `ExtractResult`, `InterfaceResolver`, `OnDemandResolver`; drives per-file extraction using tree-sitter grammars and language trait hooks; uses `GrammarLoader::get_compiled_query()` for cached query compilation (tags, complexity); `collect_symbols_from_tags` supports arbitrary-depth container nesting via two-phase assembly (build symbols, then assemble tree bottom-up); nodes where `node_name()` returns None are skipped gracefully (not abort-all); `extract_with_support` checks `ca_cache::symbol_cache()` before parsing (blake3 content hash, version key `"symbols-v1-{all|public}"`) and stores results on cache miss — making `normalize view` and other single-file commands avoid re-parsing unchanged files; cross-file resolver path skips the cache (result not fully content-addressed)
- `index.rs` — `FileIndex` (SQLite-backed store, schema v12 + CA cache integration), `CallGraphStats`, `ChangedFiles`, `SymbolMatch`; all index queries (`find_callers`, `find_callees`, `resolve_all_imports`, `resolve_all_calls`, `trace_reexports`, etc.); `find_callers` and `find_callees` now return an `access: Option<String>` field (`"write"` when the call result is assigned, `None` otherwise) sourced from the `access` column added to the `calls` table; `callee_resolved_file` in calls table enables cross-module disambiguation; `update_file()` for single-file incremental reindexing (used by LSP on save); `set_progress(true)` enables indicatif progress bars for `refresh()` and `refresh_call_graph()` (TTY-aware, hidden when stderr is not a terminal); `incremental_refresh()` returns `Vec<PathBuf>` (absolute changed paths) instead of `usize`; `all_resolved_import_edges()` and `resolved_imports_for_file()` query resolved import edges (used by the daemon to compute affected sets transiently on each refresh); `find_import_path(from, to, all_paths, path_limit, max_depth)` finds the shortest import chain(s) between two files via BFS (all_paths=false) or DFS (all_paths=true) over the resolved import graph — depth capped at 10 to prevent cycles; `import_fan_out_by_file()` returns `Vec<(file, count)>` (per-file distinct resolved import target count, used by `high-fan-out` native rule); `import_fan_in_by_file()` returns `Vec<(file, count)>` (per-file count of distinct importers, used by `high-fan-in` native rule); `save_diagnostics_json(engine, json)` / `load_diagnostics_json(engine)` persist per-engine issue JSON to the `daemon_diagnostics` table so the daemon holds no diagnostics in heap between refreshes; schema v12 adds `is_reexport INTEGER NOT NULL DEFAULT 0` to `imports` table (tracks `pub use` in Rust, `export...from` in TS/JS) and includes `trace_reexports()` which follows re-export chains (up to depth 10) after `resolve_all_imports()` so imports point to ultimate source files rather than intermediate re-export files; per-file table is `(path PRIMARY KEY, issues_blob BLOB, config_hash, updated_at)` for per-file daemon pulls — accessed via `save_diagnostics_per_file(upserts, deletes, config_hash)` (single-transaction upsert+delete), `load_diagnostics_for_file(path, expected_hash)` / `load_diagnostics_for_files(paths, expected_hash)` (return `None` / skip rows on hash mismatch — cross-daemon-restart cache invalidation), `list_diagnostic_paths()`, and `clear_all_diagnostics()` (drops every row from both diagnostic tables — used by the daemon's `.normalize/config.toml` live-reload path); `save_diagnostics_blob(engine, blob, config_hash)` / `load_diagnostics_blob(engine, expected_hash)` carry the same hash on the per-engine table; `rebuild_co_change_edges(since_commit)` walks git history via gix and populates `co_change_edges(file_a, file_b, count)` (per-file fanout cap=20, skip commits >50 files, threshold ≥2, incremental via `co_change_last_commit` meta key); `query_co_change_edges(min_count)` returns index edges or `None` when empty; `docstring` from `FlatSymbol` is stored as a `doc:<text>` row in `symbol_attributes` at index time; `FileIndex.ca_cache` uses `ca_cache::CaCache`: `refresh_call_graph` does a CA pre-pass before rayon par-iter (serial blake3 hash + cache lookup, only uncached files go through extraction); `reindex_files` checks CA cache per file on the incremental path; missing-grammar warnings are emitted by `parsers::report_missing_grammar` (called automatically from `try_get_grammar`/`parse_with_grammar`/`parser_for` in `normalize-languages`) — extraction paths skip files whose grammar `.so` is missing rather than indexing them as empty, and `service/facts.rs::rebuild_data` drains `take_missing_grammars()` into the `RebuildReport` summary
- `parsers.rs` — thin re-export shim over `normalize_languages::parsers`; provides `grammar_loader`, `parser_for`, `parse_with_grammar`, `available_external_grammars`, `try_get_grammar`, `report_missing_grammar`, `take_missing_grammars`, `peek_missing_grammars`, and `MissingGrammar` as convenience aliases so call sites inside this crate don't import from a sibling crate directly; the canonical `GrammarLoader` singleton lives in `normalize_languages::parsers` (single instance shared across the whole process)
- `symbols.rs` — `SymbolParser`: converts raw tree-sitter tag matches into `Symbol`/`FlatSymbol` records using `Language` trait hooks; `parse_file` returns `Option<Vec<FlatSymbol>>` where `None` = grammar recognized but `.so` not loaded (callers skip the file and warn), `Some(vec)` = grammar loaded (vec may be empty for files with no symbols); `flatten_symbol` preserves `Symbol.docstring` into `FlatSymbol.docstring` so it reaches the index; `find_type_refs()` extracts type-to-type relationships (field_type, param_type, return_type, extends, implements, generic_bound, type_alias) for Rust, TypeScript/TSX, Python, Go, Java, C#, Kotlin, Swift, C++, and Ruby
- `main.rs` — binary entry point for the standalone `normalize-facts` CLI (gated behind `cli` feature)
- `service.rs` — `FactsCliService` with `#[cli]` impl: `rebuild`, `stats`, `files` subcommands (gated behind `cli` feature)