Dociium
Multi-language documentation and code discovery via the Model Context Protocol (MCP).
Dociium enables AI assistants (Claude Desktop, Cline, etc.) to search and retrieve documentation and source code for Rust, Python, and Node.js packages—with semantic search, import resolution, and intelligent caching.
✨ Features
- Rust: Crate search (crates.io), documentation (docs.rs), trait implementations, symbol search
- Python: Semantic search, source code extraction, class method introspection, universal package manager support (pip, uv, poetry, pdm, conda)
- Node.js: Source code extraction with ESM/CJS support
- Import Resolution: Map
use/import/fromstatements to source locations (Rust/Python/Node) - CLI + MCP Server: Use as MCP server (stdio/HTTP) or invoke tools directly from command line
- Smart Caching: Multi-layer (in-memory LRU + disk) with metrics and TTL
- Context-Aware: Working directory support for monorepos and multi-virtualenv projects
🚀 Quick Start
Install
# From crates.io
# From source
As MCP Server
Add to your MCP settings (e.g., Claude Desktop claude_desktop_config.json):
Then ask your assistant:
- "Search for async http clients on crates.io"
- "Show documentation for tokio::sync::Mutex"
- "Find Python functions for parsing JSON in the requests library"
- "What are all the methods on the Flask class?"
As CLI Tool
# Rust crate search
# Python semantic search
# Get Python class methods
# Get implementation
# Cache statistics
🧰 Available Tools
Rust Documentation
| Tool | Description | Example |
|---|---|---|
search_crates |
Search crates.io | dociium search-crates "async runtime" |
crate_info |
Get crate metadata | dociium crate-info tokio |
get_item_doc |
Fetch item docs from docs.rs | dociium get-item-doc tokio "sync::Mutex" |
list_trait_impls |
List implementations of a trait | dociium list-trait-impls serde "Serialize" |
list_impls_for_type |
List traits for a type | dociium list-impls-for-type std "Vec" |
search_symbols |
Search symbols in a crate | dociium search-symbols tokio "spawn" |
source_snippet |
Get source code (placeholder) | dociium source-snippet tokio "sync::Mutex" |
Python & Node.js
| Tool | Description | Example |
|---|---|---|
semantic_search |
Natural language search | dociium semantic-search requests "retry failed requests" |
get_implementation |
Get source code | dociium get-implementation -l python requests "api.py#get" |
list_class_methods |
List all methods of a class | dociium list-class-methods flask "app.py#Flask" |
get_class_method |
Get specific method | dociium get-class-method flask "app.py#Flask" route |
search_package_code |
Regex code search | dociium search-package-code -l python flask "async def" |
Multi-Language
| Tool | Description | Example |
|---|---|---|
resolve_imports |
Resolve import statements | Via MCP JSON-RPC |
cache_stats |
Get cache metrics | dociium cache-stats |
clear_cache |
Clear cache | dociium clear-cache |
cleanup_cache |
Remove expired entries | dociium cleanup-cache |
🔎 Python Semantic Search
The standout feature for Python developers. Search packages using natural language:
How it works:
- TF-IDF scoring over function names, docstrings, signatures, and module paths
- Indexes public symbols (functions and classes) from installed packages
- First search builds index (0.5-3s), subsequent searches <10ms
- Results include relevance scores, signatures, docstring previews, and source locations
MCP JSON-RPC:
Score interpretation:
- 0.9+: Excellent match
- 0.7-0.9: Good match
- 0.5-0.7: Moderate relevance
- <0.5: Weak match
🐍 Python Package Discovery
Works with any Python package manager—no pip required!
Multi-level fallback strategy:
-
Environment variables (highest priority)
DOC_PYTHON_PACKAGE_PATH_<PKG>orDOC_PYTHON_PACKAGE_PATH
-
Native introspection (pure Rust, no Python runtime needed)
- Scans virtual environments:
.venv,venv,$VIRTUAL_ENV - Checks user site-packages:
~/.local/lib/python*/site-packages - Scans system locations:
/usr/local/lib,/opt/homebrew/lib
- Scans virtual environments:
-
pip show (if pip available)
-
uv pip show (if uv available)
-
Direct filesystem scan (last resort)
Supported package managers:
- pip
- uv
- poetry
- pdm
- conda
- pipenv
- Any tool that installs to standard site-packages
Context-aware resolution:
Use context_path parameter to target specific project virtualenvs:
🧩 Import Resolution
Resolve import/use statements to their source locations:
Rust:
use Mutex;
use HashMap;
Python:
Node.js:
import from 'express';
import * as utils from './utils.js';
Returns file paths and line numbers for each imported symbol.
Limitations:
- Best-effort heuristics (not a full compiler)
- Macro-expanded items (Rust) not resolved
- Dynamic imports (Python
__getattr__) not detected - Complex re-export chains may be incomplete
🗄️ Caching
Multi-layer architecture:
| Layer | Storage | Purpose |
|---|---|---|
| Memory LRU | In-process | Hot crate docs, versions |
| Disk (items) | Gzipped files | Individual Rust item docs |
| Disk (indexes) | JSON | Parsed search-index.js |
| Import cache | In-process LRU+TTL | Import resolution (5min TTL) |
| Semantic index | In-process | Python package TF-IDF vectors |
Cache metrics:
Returns hit rates, miss rates, evictions, total entries, and oldest entry age.
Cache management:
# Clear all caches
# Clear specific crate
# Remove expired entries
⚙️ Configuration
Cache Directory
Priority:
- CLI flag:
--cache-dir <path> - Env:
RDOCS_CACHE_DIR - Platform default:
$XDG_CACHE_HOME/dociium(Linux),~/Library/Caches/dociium(macOS) - Fallback:
./.dociium-cache
Working Directory
Set programmatically when embedding:
use ;
use PathBuf;
let options = DocEngineOptions ;
let engine = new_with_options.await?;
Or use context_path in tool calls (resolved relative to working directory).
Environment Overrides
Force package locations:
HTTP Server Mode
Run as HTTP server instead of stdio:
Options:
--http-listen <addr:port>: Bind address (required)--http-path <path>: Endpoint prefix (default/mcp)--http-keep-alive-secs <n>: SSE ping interval (default 30, 0=disabled)--http-stateless: Disable per-session state
🔐 Security
- Path sanitization: All file paths validated and normalized
- Input validation: Length and charset checks on crate names, versions, queries
- Timeout protection: All network calls have configurable timeouts
- No shell execution: All external commands use structured APIs (no
eval) - Safe parsing: Fallback-based parsing prevents crashes on malformed data
🧪 Testing
# All tests
# With network tests (requires internet)
ENABLE_NETWORK_TESTS=1
# Linting
# Format check
Test coverage:
- Integration tests for all MCP tools
- Unit tests for cache, search, import resolution
- Network tests (gated by env var)
- Cache metrics validation
📈 Roadmap
Near-term:
- Real Rust source snippet extraction (currently placeholder)
- Improved multi-hop import resolution
- Richer cache eviction policies
- Performance metrics export (Prometheus/OpenTelemetry)
Medium-term:
- Python
__all__and re-export handling - Node.js barrel file resolution
- Pluggable search backends (Tantivy integration)
- Persistent import cache
Long-term:
- Full rustdoc JSON ingestion
- Language server protocol (LSP) integration
- Multi-language semantic search
- Distributed cache sharing
🛠️ Development
# Clone and build
# Run as stdio MCP server
# Run as HTTP server
# Direct CLI usage
📜 License
Dual-licensed under MIT OR Apache-2.0.
See LICENSE-MIT and LICENSE-APACHE for details.
🙌 Contributing
Contributions welcome! Please:
- Open an issue describing the enhancement or fix
- Include tests (integration and/or unit)
- Maintain backward compatibility for MCP tool schemas
- Run
cargo clippyandcargo fmtbefore submitting
Dociium - Keep your AI assistant grounded in real code and documentation.
Built by Labiium with ❤️