heliosdb-codekb-mcp
MCP stdio server for code+docs knowledge bases. Embeds
HeliosDB-Nano as a Rust
library (code-graph, graph-rag, mcp-endpoint, code-embed
features) and exposes its LSP-shaped + GraphRAG tools to Claude Code,
Cursor, Codex, Aider, and any other MCP-aware agent — over plain stdio
JSON-RPC, no ports, no auth dance, all local.
v0.2.1 headline (qwen3-coder:30b on /home/gpc/HDB/Full):
−37.2 % model tokens vs no-MCP across 15 dev questions (full report
in MCP_ECOSYSTEM_BENCHMARK_REPORT_2026-05-27.md).
Biggest single-question wins: −76 k, −69 k, −47 k tokens on
broad-architecture queries. See Honest caveats for
the workloads where Read+Grep is still cheaper.
Why it improves answer quality, not just token count
The bench measures cost. But the reason MCP wins on broad questions isn't just compression — it's that the wrapper layer gives the model better-grounded raw material to reason from. Every quality lever also reduces tokens; both directions point the same way.
| Quality lever | What changes for the agent |
|---|---|
Pre-distilled answer cards (helios.answer_card.v1) |
Every wrapper response carries a summary + evidence array (file:line citations, qualified symbol names, doc-section IDs) + omitted metadata. The model doesn't have to remember where it found something across N turns — the citation rides with the answer. Less hallucinated provenance, fewer "I think this was in some file…" lapses. |
helios_ask question router |
One entry point that inspects the question and picks the right sub-wrapper (repo summary / outline-first / symbol card / doc drill). Stops the model from going down the wrong path (e.g. grep when it should outline-first). On the Full bench this picked correctly on most questions; the report calls out where it didn't and what would fix it. |
| LLM-distilled symbol summaries (Phase 2 ingest) | Each public symbol gets a one-sentence purpose summary generated ONCE at ingest by a code-aware model (e.g. qwen3-coder:30b), stored in _hdb_plugin_symbol_cards.llm_summary, reused across every agent query. A Haiku 4.5 query against a qwen-distilled card often produces a more accurate answer than Haiku reading the raw body and re-deriving the purpose itself. |
Cross-modal MENTIONS edges |
When the agent searches "FastEmbedder", the response includes the symbol AND the doc passages that name it. The agent verifies the doc claim against the actual code in one round-trip instead of having to grep both surfaces independently and stitch results. |
| PageRank-ranked file index | helios_repo_summary orders files by the symbol-graph PageRank, so the agent investigates the load-bearing implementation files before tests or examples. Without this, "describe the architecture" agents reliably waste 5–10 turns walking peripheral code. |
| Typed AST diff for "what changed" | helios_ast_diff / helios_git_summary return structured {added,removed,moved,signature_changed} rows, not raw line diffs. The model parses 200 bytes of structured rows instead of pages of +/- lines, with the same information density. |
| Bounded-by-design responses | Wrappers cap result counts (max_callers=5, max_chunks=10, fields=[…] projection). The model never gets a half-truncated response with …[truncated] and has to guess what came next — every response is complete relative to its declared bounds. |
| Compact one-tool surface | helios(action, args) default replaces the 12-tool catalogue. The model sees ~720 bytes of tool descriptions per turn instead of ~6 KB. Less "what tool should I call?" deliberation, more answer reasoning. |
When the plugin earns its keep
| Scenario | What you get |
|---|---|
| Broad architecture / cross-cutting questions | "Describe the WAL flushing path", "where does foreign-key validation happen", "what modules depend on storage" — these are the questions where MCP saved 47-76 k tokens per question on the Full bench. Read+Grep here sends the agent on a 20+ turn grep tour; MCP lands in 3-6 turns with citations. |
Cross-modal queries ("which doc section mentions the FastEmbedder symbol?") |
Read+Grep literally can't answer in one shot. The plugin's text → code MENTIONS edges traverse both sides in one tool call. |
| "What did this look like before" (time-travel / branch / commit diff) | helios_ast_diff, heliosdb_branch_*, heliosdb_time_travel answer "what did this symbol look like at commit X / on branch Y" with typed AST deltas. Read+Grep would need a checkout + reindex + grep cycle. |
| Catastrophe prevention on multi-turn tours | On the canonical Haiku bench, Opus hit its budget cap on 5/10 questions WITHOUT MCP (no answer); Haiku burned 22-32 grep+read turns on the same questions. WITH MCP, the same questions land in 3-6 turns. This stabilises the tail, not always the median. |
| Doc-heavy workflows | Heading-chunked .md ingest means helios_graphrag_search returns the matching DocSection instead of the full file. Doc retrieval shrinks to one section per question. |
Honest caveats
- Direct file/symbol lookups where the agent knows the path can still be cheaper through raw Read+Grep. The Full bench: MCP won 8/15 questions, lost 7/15. Wins were broad; losses were targeted lookups (e.g. "show me the public types in crate X" — one Read of the lib.rs beats a wrapper round-trip).
- Symbol-card population is content-hash-gated. If
helios_symbol_cardreturns{"status": "not_found"}, re-runingestto backfill the cards. The Full benchmark report has a "Recommended Next Work" section calling out the cases where card coverage matters most. - Cold-start cost is high on benchmarks because each WITH-MCP run launches a fresh
servesubprocess. Real agent sessions keep the server warm; the per-question cold start ratio improves with longer sessions. - Engine-side FRs are queued. Four engine improvements would unlock another step-change in both quality and cost — tracked in
ENGINE_FRS_FROM_CODEKB_2026-05-26.md. FR #1 (FK validation throughput) is the prerequisite for benching on/home/gpc/HDB-scale (10 k+ files) corpora; FR #4 (tools/list verbose=false) would compose with the existing--mega-toolfor even smaller catalogue payloads.
Use the plugin when the cross-modal / time-travel / catastrophe-prevention / answer-grounding value matters. The aggregate token win on broad workloads (−37 % on Full) is a bonus; the per-answer quality lift is the durable value.
Install
Step 1 — install the heliosdb-codekb-mcp binary
# binary lands at ~/.cargo/bin/heliosdb-codekb-mcp
This is the recommended path for every platform (Linux x86_64,
Linux aarch64, macOS Intel, macOS Apple Silicon). Latest published
version: v0.2.1.
First build pulls the engine (heliosdb-nano) and is slow (~10 min);
subsequent updates are cached.
A pre-built Linux x86_64 binary exists at the v0.1.0 release page only.
It is stale vs the current crates.io publish (no v0.2.x binary
yet), so cargo install is preferred when you can run it. If you
really need a binary:
Verify with the matching .sha256 from the v0.1.0 release page. Pre-
built v0.2.x binaries are tracked at issue tk.
# binary: ./target/release/heliosdb-codekb-mcp
Step 2 — index your project
Once per source-tree you want indexed:
co-located puts the KB at <project>/.helios-kb (auto-gitignored).
See KB-location modes for global / hybrid.
Step 3 — wire it into your agent
The binary is now ready. Pick your agent below.
Claude Code (claude-code)
Per-project .mcp.json (recommended — server scope follows the
project automatically):
Drop this at <project>/.mcp.json:
${CLAUDE_PROJECT_DIR} is expanded by Claude Code to the project
root, so the same .mcp.json works across machines. --mega-tool is
the default since v0.2.0 — fresh installs get the compact one-tool
surface automatically. Add --no-mega-tool if you want the full
multi-tool catalogue.
As a Claude Code plugin (gets you the /codekb-setup,
/codekb-ingest, /codekb-status slash commands + the
codekb-pro-features skill on top of the MCP server):
# One-time install from the repo's plugin manifest
Or, after a plugin marketplace publish:
/plugin install heliosdb-codekb-mcp. First run /codekb-setup
walks through binary install / embeddings / ingest in one flow.
OpenAI CODEX (codex CLI)
Add the server to ~/.codex/config.toml:
[]
= "heliosdb-codekb-mcp"
= [
"serve",
"--source", "/abs/path/to/your/project",
"--wrapper-cache-size", "128",
]
CODEX doesn't expand a ${CLAUDE_PROJECT_DIR}-style variable, so use
an absolute project path. To switch projects, edit the path (or run
several named servers — helios-foo, helios-bar, …). Verify the
server is registered with:
Then start a session as usual; the helios(action, args) tool
should appear in the tool list. If CODEX reports "MCP server not
found", confirm heliosdb-codekb-mcp is on the PATH the
codex process sees (which heliosdb-codekb-mcp from your shell;
if cargo install-ed, ~/.cargo/bin must be on PATH).
Cursor / Continue / other MCP clients (HTTP transport)
For clients that don't speak stdio:
Then point the client at:
| Route | Method | Purpose |
|---|---|---|
POST / |
JSON-RPC 2.0 | Standard MCP request channel |
GET /ws |
WebSocket upgrade | Bidirectional stream |
GET /sse |
Server-sent events | Progress notifications |
GET /info |
One-shot discovery + cache stats |
The HTTP gateway honours --mega-tool, --profile, and
--strip-tool-descriptions the same way the stdio path does.
Step 4 — verify
You should see kb-on-disk : exists and a non-zero row count. If
status reports an interrupted ingest, just re-run
heliosdb-codekb-mcp ingest --source <path> — the checkpoint resumes
from the last completed phase.
What it is
Three things at once:
- A user-level config tool. Run
init --source <PATH> --mode <co-located|global|hybrid>once per source-directory you want indexed. The choice persists in${XDG_CONFIG_HOME:-~/.config}/heliosdb-codekb-mcp/config.toml. - A KB resolver. Given a source path, finds the right KB directory using the persisted config (with longest-prefix match for hybrid setups that span multiple sub-trees).
- The MCP stdio server.
serve --source <PATH>opens anEmbeddedDatabaserooted at that source's KB and runsheliosdb_nano::mcp::McpServer::new(db).run().await. All query tools (helios_lsp_*,helios_graphrag_search,helios_ast_diff, …) come from the engine library — this binary owns transport and config, not tool surface.
KB-location modes
| Mode | Where the KB lives | Use when |
|---|---|---|
co-located |
<source>/.helios-kb (auto-added to .gitignore) |
Single repo, KB travels with the code. |
global |
${XDG_DATA_HOME}/helios-kb/<slug> (slug = source path) |
Many independent projects on one machine. |
hybrid |
An explicit --kb <PATH> you can register many sources to |
Multi-repo aggregate (e.g. ~/Helios/*). |
Document ingestion (default tier — no Docling)
Out of the box, ingestion uses pure-Rust crates. Single static binary, no Python, no Docker. Covers ~80% of real-world content:
| Format | Backend |
|---|---|
.rs / .py / .ts / .tsx / .js / .go / .sql |
tree-sitter via the engine's code-graph |
.md / .txt |
tree-sitter Markdown / generic text |
.pdf (born-digital) |
pdf-extract |
.docx |
docx-rs |
.xlsx |
calamine |
Document ingestion (--features docling)
Build with --features docling and the binary additionally routes:
| Format | Backend (via engine graph_rag::ingest_*) |
|---|---|
| Scanned / OCR-required PDFs | Docling layout + OCR |
| Multi-column scientific PDFs | Docling table & formula extraction |
.pptx / complex Office |
Docling Office pipeline |
Audio (.mp3 / .wav / .m4a) |
Docling Whisper-pipeline |
Images (.png / .jpg / .tiff) |
Docling OCR |
This tier requires a docling-serve HTTP endpoint to be running
(host-managed, or via the helios-code-graph plugin's bundled
compose stack).
Day-2 operations
After Install:
# Refresh the KB after meaningful source changes
# Inspect / sanity-check
# Re-ingest with LLM distillation (one-sentence symbol summaries via
# a self-hosted Ollama / qwen3-coder endpoint). Opt-in; slow first
# pass, incremental after.
Ingest tiers
Three knobs, picked at init --ingest or ingest time, scaling
quality vs. wall time on the pilot corpus
(~/Helios/Nano, 666 files / 18 k symbols / 115 k refs):
| Tier | Flag | Pilot wall | What lights up |
|---|---|---|---|
| fast (default) | (none) | 26 s | BM25 + hop-distance ranking on helios_graphrag_search |
| quality (blocking) | --with-embeddings |
3 m 15 s | Adds in-process FastEmbedder body vectors → paraphrase queries |
| background-quality | --background-quality |
26 s parent + ~2 m 50 s detached child | User-wait stays at 26 s; paraphrase quality lifts when the child finishes |
Recommended for repos >~1 000 files: --background-quality.
Track via status --source X ("quality phase : running pid X" →
"complete — took Y").
Resume on interrupt
ingest writes <kb_dir>/.ingest-state.json at each phase
transition (walk → code_index → graph_rag). If a process is
killed mid-flight (Ctrl-C, OOM, reboot), the next ingest reads
the file at startup and skips already-completed phases — the
walk is skipped if phase >= code_index. Per-file resume inside
code_index is the engine's content-hash gate. Cleared on
successful completion; surfaced by status --source X as
ingest resume : interrupted at phase = ... until then.
Generated-file skip
Two layers, both honoured during the walk:
@generatedcontent-marker scan of the first 4 KiB (Facebook / Google / Bazel / Go convention).<root>/.gitattributeslinguist-generated globs — patterns flagged withlinguist-generated,linguist-generated=true, orlinguist-generated=setare matched against relative paths and skipped. Long-tail coverage of generated files (*.pb.rs,codegen/**, vendored bundles) that don't carry an in-file marker.
CLI surface
heliosdb-codekb-mcp <SUBCOMMAND>
init [--source PATH] [--mode co-located|global|hybrid] [--kb PATH]
[--ingest] [--include-binary-docs BOOL]
[--force] [--durable-writes]
[--with-embeddings] [--background-quality]
ingest [--source PATH] [--include-binary-docs BOOL]
[--force] [--durable-writes]
[--with-embeddings] [--background-quality]
serve [--source PATH] [--http <ADDR>]
status [--source PATH] [--mcp-url <URL>]
config show | set-default-mode <MODE> | path
Why a separate binary
HeliosDB-Nano is a generic database; baking one specific MCP transport into its binary would adapt the engine to one consumer. This crate keeps that boundary clean: the engine stays generic and publishable to crates.io for any downstream consumer; this binary holds the MCP packaging, transport, and per-source KB-location ergonomics.
License
Apache-2.0.