Expand description
Code index host capability.
Deterministic trigram/word index plus live workspace state (agent
registry, advisory locks, append-only version log, file id assignment,
cached reads). The capability owns one SharedIndex cell per
instance; cloning the capability shares state with every Harn VM that
has been wired against it.
Surface — every builtin is locked by schemas/code_index/<method>.json:
§Workspace queries (the original 5)
| Builtin | What it does |
|---|---|
hostlib_code_index_query | Trigram-accelerated literal substring search. |
hostlib_code_index_rebuild | Walk a workspace and (re)build the in-memory index. |
hostlib_code_index_stats | Count files/trigrams/words + last rebuild timestamp. |
hostlib_code_index_imports_for | Imports declared by a single file (with resolutions). |
hostlib_code_index_importers_of | Reverse lookup: who imports the given module/path? |
§Live workspace state (added in #776)
- Agents:
agent_register,agent_heartbeat,agent_unregister,current_agent_id,status. - Locks:
lock_try,lock_release. - Change log:
current_seq,changes_since,version_record. - File table:
path_to_id,id_to_path,file_ids,file_meta,file_hash. - Cached reads:
read_range,reindex_file,trigram_query,extract_trigrams,word_get,deps_get,outline_get.
§Concurrency model
All ops serialise through a single Arc<Mutex<Option<IndexState>>> so
the IDE editor, eval, and live agent all see one consistent view. The
capability is Send + Sync so embedders can share it across threads,
but the mutex still serialises actual work.
Structs§
- Agent
Info - One row in the registry. Public so embedders that want to surface a
statuspanel can read the lifecycle state without going through the host builtins. - Agent
Registry - Per-workspace agent registry plus advisory per-file lock table.
- Build
Outcome - Summary returned from
IndexState::build_from_root. - Change
Record - Public denormalised form returned by
changes_since. - Code
Index Capability - Code-index capability handle.
- Code
Index Snapshot - Persistent on-disk form of the entire workspace index.
- DepGraph
- Forward + reverse import graph plus the side-table of unresolved import strings (raw text we couldn’t map back to a known file).
- Index
State - In-memory index for one workspace. Composed from the per-file table, the trigram + word sub-indexes, the dep graph, the append-only version log, and the agent registry.
- Indexed
File - Per-file metadata persisted in the index.
- Indexed
Symbol - Outline-style symbol entry. Reserved for AST integration; the code-index
importer leaves
IndexedFile::symbolsempty, but the shape is kept stable so storage upgrades won’t have to re-key. - Registry
Config - Registry config for agent liveness and lock expiry.
- Snapshot
Meta - On-disk metadata header. Small and cheap to read so embedders can peek at a snapshot without parsing the whole thing.
- Trigram
Index - Trigram posting list:
trigram -> set of file ids that contain it, plus a per-file reverse map for cheap re-indexing. - Version
Entry - One entry in the per-file history.
- Version
Log - Append-only log keyed by path. Both forward query patterns — “everything since X” and “the latest entry for this path” — are served from the same map.
- WordHit
- Single occurrence of an identifier-shaped token: which file it landed in and on which 1-based line number.
- Word
Index - Inverted word index keyed on identifier-shaped tokens.
Enums§
- Agent
State - Lifecycle state of one tracked agent.
- EditOp
- Edit-classification for one record. The string forms ride out to Harn scripts and the cross-repo schema so callers can switch on them.
Constants§
- HISTORY_
LIMIT - Maximum number of entries kept per path. Older entries roll off the front in FIFO order.
Type Aliases§
- AgentId
- Stable identifier for an agent in the registry.
- FileId
- Monotonically-assigned identifier for a file in the index. Stable
across re-indexes of the same path so sub-indexes can key on
FileIdwithout invalidating string keys. - Shared
Index - Shared, mutable cell carrying the (at most one) live workspace index.
Mutexrather thanRwLockbecause rebuilds flip the slot wholesale and every mutating op (record_edit, agent_register, lock_try, etc.) needs exclusive access. Single-threaded VM scripts pay no real cost from the choice; embedders that fan out across threads are still safe because the mutex serialises everyone.