Skip to main content

Crate rgx

Crate rgx 

Source
Expand description

rgx — a candidate-file index in front of ripgrep.

The crate is split so each piece is testable in isolation (see CLAUDE.md):

  • trigram — the atomic index unit and extraction helpers.
  • query — turning a regex into a sound boolean trigram query.
  • index — the trigram inverted index: build, candidate selection, incremental update, snapshot.
  • confirm — ripgrep’s own engine over a candidate file set (the matching authority).
  • proto/server/client/paths — the per-project daemon and its wire protocol.

search ties the search path together: pattern → trigram query → candidate files → ripgrep confirm, transparently falling back to a full scan when the query carries no usable constraint.

Modules§

client
Client side: connect to the project’s daemon, spawning it on first use.
compact
Token-savings presentation: reshape ripgrep’s path:line:text stream into a compact, paginated view for agents. This is a pure transform over already-rendered output — matching stays 100% ripgrep (see confirm). The contract is deliberately weaker than the byte-for-byte CLI: the match set is identical to rg (nothing is ever dropped — pagination is the only volume control), but the presentation differs: the path is printed once per file, results are paged, and pathologically long lines are center-truncated on the match (one Read from full content).
config
User config, loaded from a TOML file.
confirm
The confirm step: run ripgrep’s own engine over a set of files and emit path:line:text.
cursor
Opaque pagination cursor for the compact view. A cursor carries the entire query (pattern + options) plus a keyset resume position, so page N is provably the same search as page 1 (no flag can be dropped between pages) and a changed result set is detectable via the stored fingerprint. The encoded blob is parked in the daemon’s crate::pagination store, which hands the caller a short token to echo back, so the blob itself never has to be small or text-safe. Encoding mirrors the hand-rolled, length-prefixed style in proto and reuses its exact options bit-layout.
index
The in-memory candidate index: trigram -> set of file IDs, plus a file table.
mcp
Minimal MCP stdio server exposing rgx’s search to AI agents.
pagination
Server-side pagination store: the daemon keeps the small cursor blob (query + keyset position) for a couple of minutes and hands the client a short opaque token in its place, so the printed --cursor is tiny instead of a base64 blob. The blob is exactly what the self-contained cursor used to carry — paging still re-runs the search, so memory is a few dozen bytes per live page, not the result set. Tokens are stamped with a per-process session so a restarted daemon’s old tokens miss cleanly (the client just re-runs the search). take is single-use: following a page deletes the token it came from.
paths
Where a project’s daemon keeps its socket and index snapshot.
proto
The daemon wire protocol: length-prefixed frames over a Unix socket.
query
Turn a regex pattern into a boolean trigram query that every matching file must satisfy.
server
The per-project daemon: holds the index resident in RAM, keeps it fresh, and answers queries over a local IPC endpoint (an AF_UNIX socket on Unix, a loopback-TCP port on Windows — see crate::transport). Owning that endpoint is the single-instance lock — a second daemon that loses the race exits. The daemon serves immediately: a warm start loads the snapshot and answers at once; a cold start answers via a full ripgrep scan (the correct fallback) until the first build finishes. See docs/indexing.md and docs/index-and-storage.md.
skill
rgx --agent install|uninstall|list|skill: wire rgx into AI coding agents.
status
Shared rendering for rgx --server status (and watch). Used by the daemon with live in-RAM stats, and by the CLI when no daemon is running — in which case it still reports the on-disk index location, size, and age, read straight from the snapshot file.
transport
The daemon transport, abstracted so the same protocol runs over the best local IPC per platform.
trigram
Byte trigrams: the atomic unit of the candidate index.

Functions§

candidate_paths
Resolve the candidate files for pattern as owned paths, so a caller holding the index lock can release it before the (potentially long) ripgrep confirm + output streaming — never hold the index lock across blocking I/O. A fallback pattern yields every live file.
collect_search
Run a content search and buffer the whole path:line:text output, for callers that need the entire result at once (the compact/paged view) rather than a stream. Trigram-accelerable patterns go through the daemon (emitted in index file-id order, NOT path order); fallback patterns scan in-process in nondeterministic order. Neither is guaranteed sorted, so the compact view sorts the matches itself (see compact::format); the fallback block-sort here is a cheap extra that keeps even the raw buffered bytes deterministic across runs.
effective_pattern
The pattern actually handed to the regex engine: escaped when -F (fixed strings) is set.
is_fallback
Whether pattern has no usable trigram constraint (so every file is a candidate). The CLI uses this to scan such queries in-process — one process streamed straight to stdout, like ripgrep — instead of paying the daemon round-trip to ship a potentially huge result set back.
search
Collecting convenience over stream_search (used in tests).
stream_full_scan
Pipelined full-tree walk+search (matching ripgrep’s model), streaming through sink. Used by the CLI for fallback queries (no usable trigram) and by the daemon’s cold start before the first build finishes — both fully in-process. Once the index is ready, stream_search handles trigram-accelerable patterns.
stream_search
Stream a content search against a (ready) index, emitting path:line:text chunks via emit.