trusty-search
Machine-wide, blazingly fast hybrid code search service. One install per machine, one always-on daemon, unlimited named indexes.
📚 Documentation
Full documentation lives at the workspace top level in
docs/trusty-search/: the
research and
regression-testing indexes,
engineering sessions, and the
example multi-index config.
This README and the rustdoc stay in-crate; everything else lives under docs/.
System requirements
- Rust 1.75+ (for source builds)
- 16 GB RAM minimum (default) — hard-checked at daemon startup. The daemon exits with an actionable error message on under-spec hosts. Set
TRUSTY_SKIP_RAM_CHECK=1in the daemon environment to bypass this check for small workloads where peak RAM is known to stay well under the memory limit. Bypass at your own risk on large corpora — the default exists because realistic indexing OOMs without it. - macOS 12+ or Linux (Windows: not yet supported)
- ~2 GB disk for model cache (downloaded on first run to
~/Library/Caches/trusty-search/on macOS or$XDG_DATA_HOME/trusty-search/on Linux)
Install
From crates.io (recommended)
From source
Apple Silicon
CoreML GPU acceleration is enabled automatically on M1/M2/M3/M4. No flags or extra installs are needed. The startup log confirms the active provider:
embedder initialized: model=AllMiniLML6V2(Q) dim=384 provider=CoreML (Metal GPU / ANE)
NVIDIA GPU (CUDA)
Requires CUDA toolkit installed on the host. See CLAUDE.md for ORT_DYLIB_PATH setup on Amazon Linux 2023 and other glibc 2.34 hosts.
Quick start
The following five steps take you from zero to a running search in under five minutes.
Step 1 — Start the daemon
Expected output:
trusty-search daemon starting on http://127.0.0.1:<port>
embedder initialized: model=AllMiniLML6V2(Q) dim=384 provider=CoreML (Metal GPU / ANE)
daemon ready
The daemon auto-selects a free port and writes it to ~/Library/Application Support/trusty-search/port.lock.
Step 2 — Index a project
Expected output:
Registered index "myproj" at /Users/me/Projects/myproj
⟳ Indexing myproj [████████░░] 1204/1520 files — 12s remaining
✓ Indexed 14 823 chunks in 142s
Re-running is safe — unchanged files are skipped via content fingerprints. Use --force to rebuild from scratch.
Step 3 — Run a search
Expected output:
1. src/auth.rs:42 — authenticate (hybrid+kg, score=0.018)
fn authenticate(ctx: &Context) -> Result<Token> {
2. src/middleware.rs:17 — verify_token (hybrid, score=0.011)
...
Add --json for machine-readable output.
Step 4 — Open the admin UI
Opens http://127.0.0.1:<port>/ui in your browser. The UI provides search, index management, and an OpenRouter-backed chat panel (requires OPENROUTER_API_KEY).
Step 5 — Check status at any time
Using with Claude Code
Add trusty-search as an MCP server in your Claude Code config (~/.claude/claude_desktop_config.json or via claude mcp add):
stdio (recommended)
HTTP/SSE
Then add http://127.0.0.1:7879/sse as an SSE MCP endpoint in your Claude Code config.
Once connected, Claude Code can call search, index_file, list_indexes, and 15 other tools directly (18 total). The daemon must be running independently (trusty-search start) before Claude Code connects.
Features
- Machine-wide daemon — single install (
cargo install trusty-search), one process, unlimited registered indexes viaDashMap<IndexId, IndexHandle> - Hybrid search — BM25 (lexical, zero-dep port with camelCase / snake_case splitting) + HNSW vector (usearch 2.25, all-MiniLM-L6-v2 INT8) + Knowledge Graph 1–2 hop expansion, fused via Reciprocal Rank Fusion (k = 60, always-on)
- Query intent routing — sub-ms regex classifier routes every query to one of 5 intents and adjusts α / β weights and KG gating per query
- Branch-aware search — pass
branch_files(or justbranch: "feature/foo") toPOST /indexes/:id/search; chunks from your branch get a configurable score boost (default 1.5×) and every result carrieson_branch: bool - KG symbol graph — petgraph-backed
SymbolGraphderived from tree-sitter parses, withEdgeKind(CALLS / IMPORTS / INHERITS / CONTAINS) score multipliers; KG expansion is intent-gated (Usage only) - Auto-tuned memory tiers — 5 tiers (Tiny / Small / Medium / Large / XLarge) from < 8 GB up to 64+ GB; chunk caps, batch sizes, cache sizes, and BM25 / KG limits computed at daemon startup from detected RAM
- macOS CoreML auto-detection — on Apple Silicon the ONNX session
registers the CoreML execution provider automatically (no
--featuresflag needed since v0.3.13) - Multi-index repo support — drop a
trusty-search.yamlat the repo root to define per-directory named indexes;trusty-search indexreads it automatically (seedocs/trusty-search/examples/trusty-search.yaml) - Incremental reindex — sha2 content fingerprints skip unchanged files
across daemon restarts;
--forcetriggers a full rebuild - Zero cold-start queries — HNSW kept hot (
Duration::MAXcool-after), LRU embedding cache (256+ entries) skips re-embedding on repeat queries - Native multi-request —
Arc<SearchAppState>, reader-priorityRwLock, axum HTTP/2 — many concurrent searches against the same index never block - MCP server — stdio + HTTP/SSE transports, 18 tools (per
src/mcp/tools.rs), drop-in for Claude Code - Embedded Svelte 5 admin UI — Collections, Search, Chat, Admin panels
compiled into the binary via
include_dir!; open withtrusty-search ui - Migration path —
trusty-search convertreadsmcp-vector-searchconfigs and re-registers each project as a named index
Code quality analysis: Complexity hotspots, smell detection, and quality grades have moved to trusty-analyze. The
complexity_hotspots,smells, andqualityHTTP endpoints are not served from this binary as of v0.2.0.
Stage 1 IS a daemonized ripgrep
A lexical_only index skips embedding entirely. You get BM25 ranking plus
grep-speed pattern matching via a persistent HTTP daemon — no ONNX, no GPU,
no model download.
Certified performance on a 1,155-file Rust workspace (trusty-tools, May 2026):
| Metric | Value |
|---|---|
| Reindex time | 5.3 s (5,289 ms) |
| Throughput | 4,445 chunks/sec |
| Peak daemon RSS | 698 MB |
/grep P50 latency |
8 ms (vs ripgrep 9 ms — parity) |
Full measurement details: docs/trusty-search/regression-testing/v0.14.0-stage1-cert-2026-05-27.md
When to use lexical-only: when you want a daemonized BM25 + ripgrep with HTTP/MCP integration but do not need semantic similarity queries. Reindex is 63× faster than a full hybrid reindex (no embedding), and the daemon fits comfortably in 700 MB.
How to enable — pass lexical_only: true in the index create payload:
Or use the --lexical-only flag with the CLI:
Skip-KG mode (--no-kg) — issue #313
A skip_kg index runs Stages 1 and 2 (BM25 + vector embed) normally but
permanently skips the Phase 3 Knowledge Graph rebuild (tree-sitter symbol
extraction + petgraph construction). Useful for large documentation-heavy or
generated-code sub-indexes in polyrepos where call-chain navigation is never
needed.
Savings per index: ~50–100 MB heap (symbol graph not allocated), ~400 ms per reindex (tree-sitter extraction pass skipped).
503 contract: GET /indexes/:id/call_chain returns a structured 503 error
when skip_kg=true:
Callers must handle 503 and not assume 404 (index absent).
Three ways to enable:
CLI (--no-kg — orthogonal to --lexical-only):
YAML (trusty-search.yaml):
version: 1
indexes:
- name: docs
paths:
skip_kg: true
HTTP API:
Machine-wide default (TRUSTY_NO_KG=1 env var applies to every new index):
skip_kg and lexical_only are orthogonal (D1) — setting both suppresses
both the embedder (Stage 2) and the KG rebuild (Stage 3), leaving only BM25.
Memory tiers (auto-tuned at startup)
MEMORY_LIMIT_MB is computed dynamically as 25% of detected system RAM, clamped to 1–64 GB. It is not a fixed tier value. The env var TRUSTY_MEMORY_LIMIT_MB overrides it. All other limits below are tier-based.
| Tier | Total RAM | MEMORY_LIMIT_MB |
MAX_CHUNKS |
EMBEDDING_CACHE |
MAX_BATCH_SIZE |
BM25_CORPUS_CAP |
MAX_KG_NODES |
|---|---|---|---|---|---|---|---|
| Tiny | < 8 GB | 25% of RAM (≥ 1 GB) | 50 000 | 500 | 64 | 20 000 | 30 000 |
| Small | 8–15 GB | 25% of RAM | 100 000 | 1 000 | 128 | 50 000 | 75 000 |
| Medium | 16–31 GB | 25% of RAM | 200 000 | 5 000 | 256 | 100 000 | 150 000 |
| Large | 32–63 GB | 25% of RAM | 400 000 | 10 000 | 512 | 200 000 | 300 000 |
| XLarge | ≥ 64 GB | 25% of RAM (≤ 64 GB) | 800 000 | 20 000 | 512 | 400 000 | 500 000 |
Env vars (TRUSTY_MAX_CHUNKS, TRUSTY_EMBEDDING_CACHE, TRUSTY_MAX_BATCH_SIZE,
TRUSTY_BM25_CORPUS_CAP, TRUSTY_MAX_KG_NODES, TRUSTY_MEMORY_LIMIT_MB,
TRUSTY_COREML_BATCH_SIZE, TRUSTY_COREML_TRIPWIRE_MB)
always override the tier default. Precedence: shell env > daemon.env >
tier default. The resolved tier and all limits are logged at daemon startup.
Apple Silicon CoreML batch sizing
On Apple Silicon (M1–M4), the ONNX Runtime CoreML execution provider batches are optimised separately from CPU and GPU tiers:
DEFAULT_COREML_BATCH_SIZE = 32— optimal for Apple Neural Engine (ANE). Benchmark results on a 19k-chunk corpus show that larger batches (64, 128) consume 7–10% more time and 1.2–9.7 GB additional peak RSS with zero throughput gain. The ANE has a fixed dispatch budget; batch size scales unified-memory allocation but not per-call throughput.TRUSTY_COREML_TRIPWIRE_MB = 4096— safety net for RSS spikes. If a single CoreML embedding batch increases RSS by >4 GB, the batch size is automatically halved (floor: 1) and a warning is logged. Fires once per reindex. Override withTRUSTY_COREML_TRIPWIRE_MBenv var if your host has different memory pressure characteristics.- Non-fatal RSS probes: failure to read
/proc/self/statusreturns 0, disabling the tripwire gracefully rather than crashing.
Query intent → routing weights
| Intent | α (vector) | β (BM25) | KG-first |
|---|---|---|---|
| Definition | 0.3 | 0.7 | false |
| Usage | 0.5 | 0.5 | true |
| Conceptual | 0.8 | 0.2 | false |
| BugDebt | 0.1 | 0.9 | false |
| Unknown | 0.6 | 0.4 | false |
The classifier is a sub-ms regex over the query text. KG expansion is gated
to Usage intent only — caller/callee chains are scored at 70% of the
trigger chunk's RRF score.
CLI
# enables isolated daemon instances; each instance
# gets its own data dir, port, and index registry
# (also: TRUSTY_NO_AUTO_DISCOVER=1)
# daemon serves only already-registered indexes
# auto-detects ./trusty-search.yaml
|# Aliases preserved for backward compatibility:
MCP tools
The MCP server registers 18 tools (authoritative source: src/mcp/tools.rs
tool_definitions):
| Tool | Description |
|---|---|
search |
Hybrid search (BM25 + HNSW + KG, RRF-fused) |
search_kg |
KG-first graph-walk search; accepts optional refine_query (see below) |
search_semantic |
Vector-only semantic search lane |
search_lexical |
BM25/token lexical search lane |
search_all |
Fan-out search across every registered index |
search_similar |
Code-to-code similarity from a seed file/function |
index_file |
Add or replace a single file in the index |
remove_file |
Remove a file and all its chunks |
list_indexes |
Enumerate all registered indexes |
create_index |
Register a new (empty) index |
delete_index |
Drop an index from the registry |
reindex |
Fire-and-forget full reindex (SSE progress) |
index_status |
Per-index stats including walk diagnostics (see below) |
list_chunks |
Paginated enumeration of chunks (file, start_line) |
get_call_chain |
KG caller/callee chain for a symbol |
grep |
Literal/regex grep fallback over the corpus |
search_health |
Daemon liveness probe |
chat |
OpenRouter Q&A with auto-injected search context |
search_kg — refine_query parameter (issue #147)
search_kg performs a graph-walk expanding the KG neighbourhood of each top
hit. When the seed chunk is a weak or wrong match, the unfiltered neighbourhood
can compound the error with unrelated results.
Pass an optional refine_query string to describe the target concept in
natural language. The daemon embeds both the refine_query and every
KG-expanded neighbour, then discards neighbours whose cosine similarity against
refine_query is below 0.4. Surviving neighbours are re-ranked by cosine
score so the strongest semantic match appears first. Seeds from the primary
fused list are never filtered.
When refine_query is absent the behaviour is identical to the previous version
(fully backward-compatible).
index_status — walk diagnostic fields (issue #280)
GET /indexes/:id/status (and the index_status MCP tool) now include four
fields that let operators diagnose why a reindex produced zero chunks:
| Field | Type | Description |
|---|---|---|
last_walk_started_at |
string | null |
RFC 3339 timestamp of the most recent walk start |
last_walk_files_seen |
number |
Files discovered by the walk (after gitignore/extension filtering) |
last_walk_files_skipped |
number |
Directories skipped (gitignore, build artefacts, etc.) |
last_walk_error |
string | null |
Set when the walk found zero indexable files; describes probable cause |
These fields are populated every time a reindex task runs. On a healthy index
with chunks you will see last_walk_error: null and last_walk_files_seen > 0.
Stack
| Component | Choice |
|---|---|
| Language | Rust 2021 |
| Async runtime | tokio (full features) |
| HTTP | axum 0.7 + tower-http (CORS, trace, gzip), HTTP/2 |
| Vector store | usearch 2.25 (HNSW, in-memory, Arc<RwLock<>>) |
| Embeddings | fastembed 5.x (ONNX, all-MiniLM-L6-v2 INT8, 384-dim) |
| Lexical | Custom BM25 (zero-dep port, camelCase splitting) |
| KV store | redb 2.6 |
| Knowledge graph | petgraph 0.6 (SymbolGraph) |
| File watching | notify 6 + notify-debouncer-mini 0.4 (500 ms) |
| Code parsing | tree-sitter 0.26 (14 grammars) |
| Concurrency | dashmap 5, lru 0.12, rayon 1 |
| HTTP client | reqwest 0.12 (rustls-tls) |
| CLI | clap 4 (derive) |
| UI | Svelte 5, embedded via include_dir! |
| Hashing | sha2 (incremental reindex fingerprints) |
Troubleshooting
Daemon won't start
Run trusty-search doctor for a 6-check diagnostic. Common causes:
- Another daemon already running:
trusty-search stopthentrusty-search start - Stale PID lockfile:
trusty-search doctor --fixremoves it automatically - Less than 16 GB RAM: the daemon performs a hard RAM check and exits with an actionable error. Set
TRUSTY_SKIP_RAM_CHECK=1in the daemon environment to bypass for small workloads; not recommended on large corpora (risk of OOM during indexing)
Embedder stuck on "initializing"
The ONNX Runtime initializes the model on first start and may take 30–60 seconds on slower machines. If it hangs indefinitely, increase the timeout:
TRUSTY_EMBEDDER_INIT_TIMEOUT_SECS=120
High memory usage during reindex
The daemon has a soft RSS ceiling (TRUSTY_MEMORY_LIMIT_MB). When hit, it skips remaining batches and logs a warning. Already-committed chunks stay searchable. To lower pressure:
TRUSTY_MEMORY_LIMIT_MB=2048
Or wait for the soft cap to trip — the partial index is usable immediately.
Reindex produced zero chunks
If index_status shows chunk_count: 0 after a reindex, check the walk
diagnostic fields:
# Via CLI (pipe through jq if available)
# Via HTTP
|
Look for last_walk_error. Common causes and fixes:
last_walk_error message |
Cause | Fix |
|---|---|---|
root path does not exist: /… |
Index was registered with a path that no longer exists | Re-register with the correct path: trusty-search index /new/path --name myproj |
walk produced zero files … check gitignore rules |
All discovered files were excluded by .gitignore, extension allow-list, or path_filter |
Check .gitignore for overly broad rules; ensure at least one supported extension (.rs, .py, .ts, etc.) exists under the root path |
If last_walk_error is null but chunk_count is still 0, the walk found
files but the chunker produced no output — this usually means all files are
binary or exceed the size limit. Check RUST_LOG=debug trusty-search start for
per-file warnings.
Port conflict
The daemon auto-selects a free port on each start. The live port is written to:
- macOS:
~/Library/Application Support/trusty-search/port.lock - Linux:
$XDG_DATA_HOME/trusty-search/port.lock
If trusty-search status reports the wrong port, stop and restart the daemon.
Device flag not persisting across restarts
Use trusty-search start --device cpu to force CPU mode. The flag is persisted to daemon.env so it survives daemon restarts.
Architecture and HTTP API
See CLAUDE.md for the full HTTP endpoint catalogue, query pipeline, multi-request design, memory tuning reference, and release process.
Documentation
- CLAUDE.md — full architecture + HTTP API reference
- CHANGELOG.md — release history
- docs/trusty-search/examples/trusty-search.yaml — multi-index repo config
- docs/trusty-search/research/ — design + comparison documents
License
Elastic License 2.0 (ELv2) — free for internal use; you may not provide trusty-search as a hosted or managed service to third parties without a commercial agreement. See LICENSE for the full terms.