dci-tool
Direct Corpus Interaction for Rig Agents
dci-tool is a utility suite for Rig that gives AI agents the ability to interact directly with raw textual corpora (codebases, logs, directory structures) using sandboxed commands like search, find, read, and list.
Table of Contents
- When should I use this?
- Features
- Installation
- Quickstart
- Feature Flags
- CLI Usage
- MCP Client Integration
- Safety Model
- Development
- License
When should I use this?
Use this when vector embeddings are the wrong fit.
Standard RAG embeds text into a vector database, which destroys structural context (like file organization) and struggles with precise lexical matches (like error codes, IP addresses, or exact variable names).
Use dci-tool when you need an agent to:
- Investigate massive codebases or logs by iterating with exact-match regex and path globs.
- Produce deterministic evidence citations mapped exactly to
path:lineformat. - Examine local directory structures directly without requiring a pre-indexing step.
Features
- In-process Ripgrep Engine: Powered natively by the
grep,ignore, andglobsetcrates. Zero subprocesses means zero shell-injection risks. - Sandboxed
CorpusRoot: Path traversal, symlink escapes, and operations outside the strict corpus root are natively denied. Enforces caps on duration, hits, files walked, and bytes returned. - Rig Tools: Strongly typed
SearchTool,FindTool,ReadTool, andListToolimplementrig_core::tool::Tool. - DCI Agent: Brings everything together under
DciAgentusing any rig-compatibleCompletionModel. - MCP Server (optional): Exposes the full agent over the Model Context Protocol as a stateful capability with session continuity.
- Evaluation Harness (optional): Re-uses
rig-retrieval-evalsto run deterministic head-to-head benchmarking of lexical DCI vs Semantic Vector Retrieval against BEIR-style qrels files. - Telemetry (optional): Automatic latency, tool success, and token consumption observability via
rig-tap.
Installation
Add the library to your project:
Optional capabilities are gated behind feature flags. For example, to pull in the MCP server and evaluation harness:
To install the command-line binaries (dci, dci-mcp, dci-eval):
Quickstart
There are two main ways to use dci-tool with a Rig agent: using the pre-packed DciAgent wrapper, or bringing the raw CorpusTools into your own custom agent.
1. The "Batteries Included" Agent (DciAgent)
The easiest way to get started is by using DciAgent. It automatically configures the agent with a pre-written investigation preamble (teaching it the search -> narrow -> read -> cite loop) and attaches the tools.
use ;
use openai;
async
2. Manual Tool Integration (CorpusTools)
If you want complete control over your Rig agent's prompt, or if you just want to provide file system superpowers to an existing custom toolset, you can mount the CorpusTools manually onto an AgentBuilder:
use ;
use AgentBuilder;
use openai;
async
Feature Flags
All optional capabilities are off by default. Enable only what you need.
| Feature | Default | Enables |
|---|---|---|
cli |
No | The dci, dci-mcp, and dci-eval command-line binaries. |
mcp |
No | Exposes the agent as a stateful Model Context Protocol server (DciMcpService). |
eval |
No | Retrieval-quality benchmarking of lexical DCI vs. semantic vector retrieval via BEIR-style qrels. |
telemetry |
No | Latency, tool-success, and token-consumption observability through rig-tap. |
CLI Usage
The crate includes dci and dci-mcp binaries (via the cli and mcp features).
# Ask a question across your local directory:
# Serve as a stateful Model Context Protocol server over stdio:
MCP Client Integration
Because dci-mcp implements the Model Context Protocol, you can plug it directly into AI clients like Claude Desktop, Cursor, or VS Code, giving them a stateful dci_investigate capability out of the box.
Claude Desktop
Add this to your claude_desktop_config.json. Make sure to use an absolute path for your corpus directory:
Agent Instructions (.cursorrules / .github/copilot-instructions.md)
While the MCP protocol self-advertises its schema automatically, you can add this snippet to your workspace rules to aggressively nudge your IDE's agent to prefer dci-tool over its default search:
When investigating massive logs, unknown directories, or complex codebase issues, DO NOT use your default workspace search. Instead, invoke the `dci_investigate` MCP tool to recursively interrogate the corpus. Always pass a stable `session_id` to maintain context across multiple turns during the same investigation.
Safety Model
- Every caller-supplied path is canonicalized and must maintain a prefix of the initialized
CorpusRoot. - Symlink evaluation happens before boundary-checks.
- Completely read-only operations. (No
fs::writeorfs::removeAPIs used anywhere). - No external network execution from tools.
- All executions are strictly bounded by elapsed wall-time, total files processed, and maximum results gathered.
Development
This repo uses just as a task runner. Run just to list recipes.
Releases are automated with release-plz, driven by
Conventional Commits. Merging to master
opens a release PR; merging that PR publishes to crates.io and tags the release.
License
Licensed under either of
- Apache License, Version 2.0 (LICENSE-APACHE or http://www.apache.org/licenses/LICENSE-2.0)
- MIT License (LICENSE-MIT or http://opensource.org/licenses/MIT)
at your option.
Unless you explicitly state otherwise, any contribution intentionally submitted for inclusion in the work by you, as defined in the Apache-2.0 license, shall be dual licensed as above, without any additional terms or conditions.