# normalize-facts
Code fact extraction and storage — extracts symbols, imports, call graph data, and access-annotated calls from source files using tree-sitter and stores them in a SQLite database (via libsql).
Key exports: `FileIndex` (open/query the SQLite index), `Extractor` (walk a project and populate the index), `SymbolParser` (flatten tree-sitter parse results into `Symbol`/`Import` records), `ExtractOptions`/`ExtractResult`, `InterfaceResolver`/`OnDemandResolver` (import resolution strategies), `CallGraphStats`, `ChangedFiles`, `IndexedFile`, `SymbolMatch`. Re-exports all `normalize-facts-core` types for convenience. Depends on `normalize-languages` for grammar loading (via `grammar_loader`/`parser_for`/`parse_with_grammar`), `normalize-local-deps` for package discovery, and `indicatif` for progress bars. Test fixtures in `tests/fixtures/` cover symbol and import extraction across 30+ language samples.
Extraction is memoized via `ca_cache::CaCache` (inlined module, formerly the `normalize-ca-cache` crate): `refresh_call_graph` does a serial CA pre-pass (blake3 hash + CA get) before the rayon par-iter, extracting only uncached files. `reindex_files` (incremental path) checks and populates the CA cache per file. The CA cache is keyed by `(blake3(file_bytes), EXTRACTOR_VERSION, grammar_name)` and stored at `~/.config/normalize/ca-cache.sqlite`.
The `Extractor` (used by `normalize view`, `normalize rank`, single-file analysis, etc.) also uses the CA cache via a separate `symbol_cache()` global singleton. Symbol cache entries use version keys `"symbols-v1-all"` / `"symbols-v1-public"` (determined by `include_private`), keeping them distinct from index extraction entries. `gc_stale_versions` now preserves `"symbols-*"` entries; `gc_stale_symbol_versions` handles symbol cache GC separately. Cross-file resolver results (TypeScript/JavaScript interface resolution) are never cached since they depend on other files.
`CallEntry.access` is populated from the call graph index with read/write distinction when the language supports it. `ChangedFiles` tracks which files changed between index refreshes for incremental fact-rule evaluation via the daemon.
The `cli` feature adds a standalone `FactsCliService` (`src/service.rs`) with `rebuild`, `stats`, and `files` subcommands. Output types (`RebuildReport`, `StructureStatsReport`, `StructureFilesReport`) implement `OutputFormatter`. Note: function parameters are not extracted as facts (no "parameter" `SymbolKind`); parameter-level analysis is handled by `normalize-scope` via `locals.scm` queries.
Schema includes a `config_hash` column on diagnostic blob tables for cross-restart cache validity (hash covers binary version + `.normalize/config.toml` + rule files). Grammar load failure skips the file with a loud warning rather than silently returning empty results. Parallel fact-rule evaluation (~2× speedup via rayon) with CA cache poisoning fix.
Schema version 14 adds `cfg_effects` table: `(file, function_qname, function_start_line, block_id, kind, byte_offset, line, label)` for CFG Phase 3 effect tracking (await, defer, yield, acquire, release, send, receive). `all_cfg_effects()` query method added to `FileIndex`. Wired into `refresh_call_graph`, `reindex_files`, and `build_cfg_data_for_file` paths.