Expand description
rgx — a candidate-file index in front of ripgrep.
The crate is split so each piece is testable in isolation (see CLAUDE.md):
trigram— the atomic index unit and extraction helpers.query— turning a regex into a sound boolean trigram query.index— the trigram inverted index: build, candidate selection, incremental update, snapshot.confirm— ripgrep’s own engine over a candidate file set (the matching authority).proto/server/client/paths— the per-project daemon and its wire protocol.
search ties the search path together: pattern → trigram query → candidate files → ripgrep
confirm, transparently falling back to a full scan when the query carries no usable constraint.
Modules§
- client
- Client side: connect to the project’s daemon, spawning it on first use.
- compact
- Token-savings presentation: reshape ripgrep’s
path:line:textstream into a compact, paginated view for agents. This is a pure transform over already-rendered output — matching stays 100% ripgrep (seeconfirm). The contract is deliberately weaker than the byte-for-byte CLI: the match set is identical torg(nothing is ever dropped — pagination is the only volume control), but the presentation differs: the path is printed once per file, results are paged, and pathologically long lines are center-truncated on the match (oneReadfrom full content). - config
- User config, loaded from a TOML file.
- confirm
- The confirm step: run ripgrep’s own engine over a set of files and emit
path:line:text. - cursor
- Opaque pagination cursor for the compact view. A cursor carries the entire query (pattern +
options) plus a keyset resume position, so page N is provably the same search as page 1 (no flag
can be dropped between pages) and a changed result set is detectable via the stored fingerprint.
The encoded blob is parked in the daemon’s
crate::paginationstore, which hands the caller a short token to echo back, so the blob itself never has to be small or text-safe. Encoding mirrors the hand-rolled, length-prefixed style inprotoand reuses its exact options bit-layout. - index
- The in-memory candidate index: trigram -> set of file IDs, plus a file table.
- mcp
- Minimal MCP stdio server exposing rgx’s search to AI agents.
- pagination
- Server-side pagination store: the daemon keeps the small cursor blob (query + keyset position) for
a couple of minutes and hands the client a short opaque token in its place, so the printed
--cursoris tiny instead of a base64 blob. The blob is exactly what the self-contained cursor used to carry — paging still re-runs the search, so memory is a few dozen bytes per live page, not the result set. Tokens are stamped with a per-process session so a restarted daemon’s old tokens miss cleanly (the client just re-runs the search).takeis single-use: following a page deletes the token it came from. - paths
- Where a project’s daemon keeps its socket and index snapshot.
- proto
- The daemon wire protocol: length-prefixed frames over a Unix socket.
- query
- Turn a regex pattern into a boolean trigram query that every matching file must satisfy.
- server
- The per-project daemon: holds the index resident in RAM, keeps it fresh, and answers queries over
a local IPC endpoint (an AF_UNIX socket on Unix, a loopback-TCP port on Windows — see
crate::transport). Owning that endpoint is the single-instance lock — a second daemon that loses the race exits. The daemon serves immediately: a warm start loads the snapshot and answers at once; a cold start answers via a full ripgrep scan (the correct fallback) until the first build finishes. Seedocs/indexing.mdanddocs/index-and-storage.md. - skill
rgx --agent install|uninstall|list|skill: wire rgx into AI coding agents.- status
- Shared rendering for
rgx --server status(andwatch). Used by the daemon with live in-RAM stats, and by the CLI when no daemon is running — in which case it still reports the on-disk index location, size, and age, read straight from the snapshot file. - transport
- The daemon transport, abstracted so the same protocol runs over the best local IPC per platform.
- trigram
- Byte trigrams: the atomic unit of the candidate index.
Functions§
- candidate_
paths - Resolve the candidate files for
patternas owned paths, so a caller holding the index lock can release it before the (potentially long) ripgrep confirm + output streaming — never hold the index lock across blocking I/O. A fallback pattern yields every live file. - collect_
search - Run a content search and buffer the whole
path:line:textoutput, for callers that need the entire result at once (the compact/paged view) rather than a stream. Trigram-accelerable patterns go through the daemon (emitted in index file-id order, NOT path order); fallback patterns scan in-process in nondeterministic order. Neither is guaranteed sorted, so the compact view sorts the matches itself (seecompact::format); the fallback block-sort here is a cheap extra that keeps even the raw buffered bytes deterministic across runs. - effective_
pattern - The pattern actually handed to the regex engine: escaped when
-F(fixed strings) is set. - is_
fallback - Whether
patternhas no usable trigram constraint (so every file is a candidate). The CLI uses this to scan such queries in-process — one process streamed straight to stdout, like ripgrep — instead of paying the daemon round-trip to ship a potentially huge result set back. - search
- Collecting convenience over
stream_search(used in tests). - stream_
full_ scan - Pipelined full-tree walk+search (matching ripgrep’s model), streaming through
sink. Used by the CLI for fallback queries (no usable trigram) and by the daemon’s cold start before the first build finishes — both fully in-process. Once the index is ready,stream_searchhandles trigram-accelerable patterns. - stream_
search - Stream a content search against a (ready) index, emitting
path:line:textchunks viaemit.