# dci-tool
**Direct Corpus Interaction for Rig Agents**
[](https://crates.io/crates/dci-tool)
[](https://docs.rs/dci-tool)
[](https://github.com/ForeverAngry/dci-tool/actions/workflows/ci.yml)
[](#license)
`dci-tool` is a utility suite for [Rig](https://github.com/0xPlaygrounds/rig) that gives AI agents the ability to interact *directly* with raw textual corpora (codebases, logs, directory structures) using sandboxed commands like `search`, `find`, `read`, and `list`.
## Table of Contents
- [When should I use this?](#when-should-i-use-this)
- [Features](#features)
- [Installation](#installation)
- [Quickstart](#quickstart)
- [Feature Flags](#feature-flags)
- [CLI Usage](#cli-usage)
- [MCP Client Integration](#mcp-client-integration)
- [Safety Model](#safety-model)
- [Development](#development)
- [License](#license)
## When should I use this?
**Use this when vector embeddings are the wrong fit.**
Standard RAG embeds text into a vector database, which destroys structural context (like file organization) and struggles with precise lexical matches (like error codes, IP addresses, or exact variable names).
Use `dci-tool` when you need an agent to:
- **Investigate massive codebases or logs** by iterating with exact-match regex and path globs.
- **Produce deterministic evidence citations** mapped exactly to `path:line` format.
- **Examine local directory structures** directly without requiring a pre-indexing step.
## Features
- **In-process Ripgrep Engine**: Powered natively by the `grep`, `ignore`, and `globset` crates. Zero subprocesses means zero shell-injection risks.
- **Sandboxed `CorpusRoot`**: Path traversal, symlink escapes, and operations outside the strict corpus root are natively denied. Enforces caps on duration, hits, files walked, and bytes returned.
- **Rig Tools**: Strongly typed `SearchTool`, `FindTool`, `ReadTool`, and `ListTool` implement `rig_core::tool::Tool`.
- **DCI Agent**: Brings everything together under `DciAgent` using any rig-compatible `CompletionModel`.
- **MCP Server** *(optional)*: Exposes the full agent over the Model Context Protocol as a stateful capability with session continuity.
- **Evaluation Harness** *(optional)*: Re-uses `rig-retrieval-evals` to run deterministic head-to-head benchmarking of lexical DCI vs Semantic Vector Retrieval against BEIR-style qrels files.
- **Telemetry** *(optional)*: Automatic latency, tool success, and token consumption observability via `rig-tap`.
## Installation
Add the library to your project:
```bash
cargo add dci-tool
```
Optional capabilities are gated behind [feature flags](#feature-flags). For example, to pull in the MCP server and evaluation harness:
```bash
cargo add dci-tool --features mcp,eval
```
To install the command-line binaries (`dci`, `dci-mcp`, `dci-eval`):
```bash
cargo install dci-tool --features cli,mcp,eval
```
## Quickstart
There are two main ways to use `dci-tool` with a Rig agent: using the pre-packed `DciAgent` wrapper, or bringing the raw `CorpusTools` into your own custom agent.
### 1. The "Batteries Included" Agent (`DciAgent`)
The easiest way to get started is by using `DciAgent`. It automatically configures the agent with a pre-written investigation preamble (teaching it the search -> narrow -> read -> cite loop) and attaches the tools.
```rust
use dci_tool::{CorpusRoot, DciAgent};
use rig_core::providers::openai;
#[tokio::main]
async fn main() -> Result<(), Box<dyn std::error::Error>> {
let client = openai::Client::from_env()?;
let model = client.completion_model("gpt-4o");
// 1. Define the safe root boundary that the agent is not allowed to escape
let corpus = CorpusRoot::new("/path/to/my/project")?;
// 2. Build the DCI agent
let agent = DciAgent::builder(model, corpus)
.max_turns(10) // Set tool iteration budget
.build();
// 3. Ask it a question
let answer = agent.investigate("Where is the authentication bug?").await?;
println!("{answer}");
Ok(())
}
```
### 2. Manual Tool Integration (`CorpusTools`)
If you want complete control over your Rig agent's prompt, or if you just want to provide file system superpowers to an existing custom toolset, you can mount the `CorpusTools` manually onto an `AgentBuilder`:
```rust
use dci_tool::{CorpusRoot, tools::CorpusTools};
use rig_core::agent::AgentBuilder;
use rig_core::providers::openai;
#[tokio::main]
async fn main() -> Result<(), Box<dyn std::error::Error>> {
let client = openai::Client::from_env()?;
let model = client.completion_model("gpt-4o");
let corpus = CorpusRoot::new("/path/to/my/project")?;
// Instantiate the bundle of SearchTool, FindTool, ReadTool, and ListTool
let tools = CorpusTools::new(corpus);
// Build your standard rig Agent
let agent = AgentBuilder::new(model)
.preamble("You are an expert vulnerability scanner. Use the tools to inspect the local filesystem.")
// Wire in the DCI tools manually
.tool(tools.search)
.tool(tools.find)
.tool(tools.read)
.tool(tools.list)
.build();
let response = agent.prompt("Find the auth bug")
.max_turns(15) // Give it enough turns to search and read!
.await?;
println!("{}", response);
Ok(())
}
```
## Feature Flags
All optional capabilities are off by default. Enable only what you need.
| `cli` | No | The `dci`, `dci-mcp`, and `dci-eval` command-line binaries. |
| `mcp` | No | Exposes the agent as a stateful Model Context Protocol server (`DciMcpService`). |
| `eval` | No | Retrieval-quality benchmarking of lexical DCI vs. semantic vector retrieval via BEIR-style qrels. |
| `telemetry` | No | Latency, tool-success, and token-consumption observability through `rig-tap`. |
## CLI Usage
The crate includes `dci` and `dci-mcp` binaries (via the `cli` and `mcp` features).
```bash
# Ask a question across your local directory:
dci --corpus ./src --provider openai "Where is the authentication bug?"
# Serve as a stateful Model Context Protocol server over stdio:
dci-mcp --corpus ./src --provider anthropic --model claude-3-7-sonnet
```
## MCP Client Integration
Because `dci-mcp` implements the **Model Context Protocol**, you can plug it directly into AI clients like Claude Desktop, Cursor, or VS Code, giving them a stateful `dci_investigate` capability out of the box.
### Claude Desktop
Add this to your `claude_desktop_config.json`. Make sure to use an **absolute path** for your corpus directory:
```json
{
"mcpServers": {
"dci": {
"command": "dci-mcp",
"args": [
"--corpus", "/absolute/path/to/project",
"--provider", "openai"
],
"env": {
"OPENAI_API_KEY": "sk-..."
}
}
}
}
```
### Agent Instructions (`.cursorrules` / `.github/copilot-instructions.md`)
While the MCP protocol self-advertises its schema automatically, you can add this snippet to your workspace rules to aggressively nudge your IDE's agent to prefer `dci-tool` over its default search:
```md
When investigating massive logs, unknown directories, or complex codebase issues, DO NOT use your default workspace search. Instead, invoke the `dci_investigate` MCP tool to recursively interrogate the corpus. Always pass a stable `session_id` to maintain context across multiple turns during the same investigation.
```
## Safety Model
1. Every caller-supplied path is canonicalized and must maintain a prefix of the initialized `CorpusRoot`.
2. Symlink evaluation happens before boundary-checks.
3. Completely read-only operations. (No `fs::write` or `fs::remove` APIs used anywhere).
4. No external network execution from tools.
5. All executions are strictly bounded by elapsed wall-time, total files processed, and maximum results gathered.
## Development
This repo uses [`just`](https://github.com/casey/just) as a task runner. Run `just` to list recipes.
```bash
just check # fmt + clippy + tests + msrv + doc + publish dry-run
just test # run the test suite across feature combinations
just doc # build rustdoc with strict warnings
```
Releases are automated with [release-plz](https://release-plz.dev/), driven by
[Conventional Commits](https://www.conventionalcommits.org/). Merging to `master`
opens a release PR; merging that PR publishes to crates.io and tags the release.
## License
Licensed under either of
- Apache License, Version 2.0 ([LICENSE-APACHE](LICENSE-APACHE) or <http://www.apache.org/licenses/LICENSE-2.0>)
- MIT License ([LICENSE-MIT](LICENSE-MIT) or <http://opensource.org/licenses/MIT>)
at your option.
Unless you explicitly state otherwise, any contribution intentionally submitted
for inclusion in the work by you, as defined in the Apache-2.0 license, shall be
dual licensed as above, without any additional terms or conditions.