Skip to main content

Module code_index

Module code_index 

Source
Expand description

Code index host capability.

Deterministic trigram/word index plus live workspace state (agent registry, advisory locks, append-only version log, file id assignment, cached reads). The capability owns one SharedIndex cell per instance; cloning the capability shares state with every Harn VM that has been wired against it.

Surface — every builtin is locked by schemas/code_index/<method>.json:

§Workspace queries (the original 5)

BuiltinWhat it does
hostlib_code_index_queryTrigram-accelerated literal substring search.
hostlib_code_index_rebuildWalk a workspace and (re)build the in-memory index.
hostlib_code_index_statsCount files/trigrams/words + last rebuild timestamp.
hostlib_code_index_imports_forImports declared by a single file (with resolutions).
hostlib_code_index_importers_ofReverse lookup: who imports the given module/path?

§Live workspace state (added in #776)

  • Agents: agent_register, agent_heartbeat, agent_unregister, current_agent_id, status.
  • Locks: lock_try, lock_release.
  • Change log: current_seq, changes_since, version_record.
  • File table: path_to_id, id_to_path, file_ids, file_meta, file_hash.
  • Cached reads: read_range, reindex_file, trigram_query, extract_trigrams, word_get, deps_get, outline_get.

§Typed symbol graph (added in #2434)

  • cypher: read-only Cypher executor over the typed graph (SymbolGraph) — MATCH ... WHERE ... RETURN with typed nodes (Function|Type|Module|Import|CallSite|Macro), typed edges (CALLS|REFS|IMPORTS|CONTAINS|OVERRIDES, plus _BY inverses), and variable-length hops up to depth 4.
  • branch_overlay: per-branch CDC overlay that layers a delta on top of the base graph; reuses ≥95% of the main index in storage/CPU for untouched files. See BranchOverlay.
  • freshness: per-file hash + mtime comparison against the indexed snapshot; consumers detect staleness without forcing a rebuild.

§Concurrency model

All ops serialise through a single Arc<Mutex<Option<IndexState>>> so the IDE editor, eval, and live agent all see one consistent view. The capability is Send + Sync so embedders can share it across threads, but the mutex still serialises actual work.

Structs§

AgentInfo
One row in the registry. Public so embedders that want to surface a status panel can read the lifecycle state without going through the host builtins.
AgentRegistry
Per-workspace agent registry plus advisory per-file lock table.
BranchOverlay
A delta layered on top of the base SymbolGraph. The overlay owns the rebuilt slice for every changed file — the base graph is never mutated when an overlay is activated.
BuildOutcome
Summary returned from IndexState::build_from_root.
ChangeRecord
Public denormalised form returned by changes_since.
CodeIndexCapability
Code-index capability handle.
CodeIndexSnapshot
Persistent on-disk form of the entire workspace index.
DepGraph
Forward + reverse import graph plus the side-table of unresolved import strings (raw text we couldn’t map back to a known file).
Edge
One directed edge.
IndexState
In-memory index for one workspace. Composed from the per-file table, the trigram + word sub-indexes, the dep graph, the append-only version log, and the agent registry.
IndexedFile
Per-file metadata persisted in the index.
IndexedSymbol
Outline-style symbol entry. Populated during the index rebuild from the same tree-sitter parse that backs the typed symbol graph (issue #2456); files whose extension doesn’t map to a known grammar leave IndexedFile::symbols empty.
Node
One typed node in the symbol graph. line is 1-based to match the rest of the host-builtin wire format; path is workspace-relative.
OverlayState
Holder for the active overlay (if any). Threaded through the index state so every Cypher query can opt into the per-branch view.
RegistryConfig
Registry config for agent liveness and lock expiry.
SnapshotMeta
On-disk metadata header. Small and cheap to read so embedders can peek at a snapshot without parsing the whole thing.
SymbolGraph
Typed symbol graph for a single workspace.
TrigramIndex
Trigram posting list: trigram -> set of file ids that contain it, plus a per-file reverse map for cheap re-indexing.
VersionEntry
One entry in the per-file history.
VersionLog
Append-only log keyed by path. Both forward query patterns — “everything since X” and “the latest entry for this path” — are served from the same map.
WordHit
Single occurrence of an identifier-shaped token: which file it landed in and on which 1-based line number.
WordIndex
Inverted word index keyed on identifier-shaped tokens.

Enums§

AgentState
Lifecycle state of one tracked agent.
CypherError
Error variants the parser/executor raise. The host wraps these in crate::error::HostlibError before surfacing to scripts. Each variant carries a free-text message identifying the offending token or position.
CypherValue
One projected value in a row.
EdgeKind
Coarse typed edge kinds defined in issue #2434.
EditOp
Edit-classification for one record. The string forms ride out to Harn scripts and the cross-repo schema so callers can switch on them.
NodeKind
Coarse typed node kinds defined in issue #2434.

Constants§

HISTORY_LIMIT
Maximum number of entries kept per path. Older entries roll off the front in FIFO order.

Type Aliases§

AgentId
Stable identifier for an agent in the registry.
CypherRow
One projected row.
FileId
Monotonically-assigned identifier for a file in the index. Stable across re-indexes of the same path so sub-indexes can key on FileId without invalidating string keys.
NodeId
Typed node identifier. Stable across rebuild_file calls that don’t touch the file (id assignment is per-file deterministic — see SymbolGraph::rebuild_file).
SharedIndex
Shared, mutable cell carrying the (at most one) live workspace index. Mutex rather than RwLock because rebuilds flip the slot wholesale and every mutating op (record_edit, agent_register, lock_try, etc.) needs exclusive access. Single-threaded VM scripts pay no real cost from the choice; embedders that fan out across threads are still safe because the mutex serialises everyone.