cqs ("seeks")
Code intelligence MCP server that saves AI agents tokens. Semantic search, call graph tracing, impact analysis — in single tool calls instead of dozens of file reads. Local ML, GPU-accelerated.
TL;DR: Semantic code search for Claude Code. Instead of grep, it understands what code does — so Claude finds relevant functions without knowing exact names. Tools like trace, impact, gather, and batch collapse 5-10 sequential tool calls into one, cutting context window usage by 50-80% on common code navigation tasks.
<claude>You're an AI assistant reading this. cqs gives you semantic code search + code intelligence via MCP.
Why you want this: Find functions by what they do, not their names. Tools like
cqs_trace,cqs_impact, andcqs_gathercollapse 5-10 file reads into one call, saving context window for actual work.Setup:
claude mcp add cqs -- cqs serve --project /path/to/project
</claude>
Install
Requires Rust 1.88+ (due to ort ONNX runtime dependency)
Upgrading? Schema changes require rebuilding the index:
Quick Start
# Download model and initialize
# Index your project
# Search
Filters
# By language
# By path pattern
# By chunk type
# By structural pattern
# Patterns: builder, error_swallow, async, mutex, unsafe, recursion
# Combined
# Hybrid search tuning
# Show surrounding context
# Output options
Configuration
Set default options via config files. CLI flags override config file values.
Config locations (later overrides earlier):
~/.config/cqs/config.toml- user defaults.cqs.tomlin project root - project overrides
Example .cqs.toml:
# Default result limit
= 10
# Minimum similarity threshold (0.0 - 1.0)
= 0.4
# Name boost for hybrid search (0.0 = pure semantic, 1.0 = pure name)
= 0.2
# Note weight in search results (0.0-1.0, lower = notes rank below code)
= 1.0
# Output modes
= false
= false
Watch Mode
Keep your index up to date automatically:
Watch mode respects .gitignore by default. Use --no-ignore to index ignored files.
Call Graph
Find function call relationships:
Use cases:
- Impact analysis: What calls this function I'm about to change?
- Context expansion: Show related functions
- Entry point discovery: Find functions with no callers
Call graph is indexed across all files - callers are found regardless of which file they're in.
Discovery Tools
# Find functions similar to a given function (search by example)
# Function card: signature, callers, callees, similar functions
# Semantic diff between indexed snapshots
Code Intelligence
# Follow a call chain between two functions (BFS shortest path)
# Impact analysis: what breaks if I change this function?
# Map functions to their tests
# Module overview: chunks, callers, callees, notes for a file
Maintenance
# Find dead code (functions never called by indexed code)
# Garbage collection (remove stale index entries)
# Cross-project search
# Smart context assembly (gather related code)
Reference Indexes (Multi-Index Search)
Search across your project and external codebases simultaneously:
Once added, all searches automatically include reference results:
Reference results are ranked with a weight multiplier (default 0.8) so project results naturally appear first at equal similarity.
MCP integration: The cqs_search tool gains a sources parameter to filter which indexes to search:
- Omit
sourcesto search all indexes sources: ["project"]— search only the primary projectsources: ["tokio"]— search only the tokio reference
References are configured in .cqs.toml:
[[]]
= "tokio"
= "/home/user/.local/share/cqs/refs/tokio"
= "/home/user/code/tokio"
= 0.8
Claude Code Integration
Why use cqs?
Without cqs, Claude uses grep/glob to find code and reads entire files for context. With cqs:
- Fewer tool calls:
trace,impact,gather,contexteach replace 5-10 sequential file reads with a single call - Less context burn: Focused
cqs_readreturns a function + its type dependencies — not the whole file.batchruns 10 queries in one round-trip - Find code by behavior: "function that retries with backoff" finds retry logic even if it's named
doWithAttempts - Navigate unfamiliar codebases: Semantic search finds relevant code without knowing project structure
Setup
Step 1: Add cqs as an MCP server:
Or manually in ~/.claude.json:
Note: The --project argument is required because MCP servers run from an unpredictable working directory.
GPU acceleration: Add --gpu for faster query embedding after warmup:
GPU: ~12ms warm queries. CPU (default): ~22ms. Server starts instantly with HNSW, upgrades to GPU in background.
Step 2: Add to your project's CLAUDE.md so Claude uses it automatically:
Use `cqs_search` for semantic code search instead of grep/glob when looking for:
- --
Available tools:
- - --------------------
Keep index fresh: run `cqs watch` in a background terminal, or `cqs index` after significant changes.
HTTP Transport
For web integrations, use the HTTP transport:
Endpoints:
POST /mcp- JSON-RPC requests (MCP protocol messages)GET /mcp- SSE stream for server-to-client notificationsGET /health- Health check (returns 200 OK when server is ready)
Authentication: For network-exposed servers, API key authentication is required:
# Via flag
# Via environment variable
# Via file (recommended - keeps secret out of process list)
Clients must include Authorization: Bearer SECRET header.
Network binding: By default, cqs binds to localhost only. To expose on a network:
# Requires both flags for safety
Implements MCP Streamable HTTP spec 2025-11-25 with Origin validation and protocol version headers.
Supported Languages
- Rust
- Python
- TypeScript
- JavaScript (JSDoc
@param/@returnstags improve search quality) - Go
- C
- Java
Indexing
By default, cqs index respects .gitignore rules:
How It Works
- Parses code with tree-sitter to extract:
- Functions and methods
- Classes and structs
- Enums, traits, interfaces
- Constants
- Generates embeddings with E5-base-v2 (runs locally)
- Includes doc comments for better semantic matching
- Stores in SQLite with vector search + FTS5 keyword index
- Hybrid search (RRF): Combines semantic similarity with keyword matching
- Semantic search finds conceptually related code
- Keyword search catches exact identifier matches (e.g.,
parseConfig) - Reciprocal Rank Fusion merges both rankings for best results
- Uses GPU if available, falls back to CPU
HNSW Index Tuning
The HNSW (Hierarchical Navigable Small World) index provides fast approximate nearest neighbor search. Current parameters:
| Parameter | Value | Description |
|---|---|---|
| M (connections) | 24 | Max edges per node. Higher = better recall, more memory |
| ef_construction | 200 | Search width during build. Higher = better index, slower build |
| max_layers | 16 | Graph layers. ~log(N) is typical |
| ef_search | 100 | Search width at query time. Higher = better recall, slower search |
Trade-offs:
- Recall vs speed: Higher ef_search improves recall but slows queries
- Index size: ~4KB per vector with current settings
- Build time: O(N * M * ef_construction) complexity
For most codebases (<100k chunks), defaults work well. Large repos may benefit from tuning ef_search higher (200+) if recall matters more than latency.
Search Quality
Hybrid search (RRF) combines semantic understanding with keyword matching:
| Query | Top Match | Score |
|---|---|---|
| "cosine similarity" | cosine_similarity |
0.85 |
| "validate email regex" | validateEmail |
0.73 |
| "check if adult age 18" | isAdult |
0.71 |
| "pop from stack" | Stack.Pop |
0.70 |
| "generate random id" | generateId |
0.70 |
GPU Acceleration (Optional)
cqs works on CPU (~20ms per embedding). GPU provides 3x+ speedup:
| Mode | Single Query | Batch (50 docs) |
|---|---|---|
| CPU | ~20ms | ~15ms/doc |
| CUDA | ~6ms | ~0.3ms/doc |
For GPU acceleration:
Linux
# Add NVIDIA CUDA repo
# Install CUDA runtime and cuDNN 9
Set library path:
WSL2
Same as Linux, plus:
- Requires NVIDIA GPU driver on Windows host
- Add
/usr/lib/wsl/libtoLD_LIBRARY_PATH - Tested working with RTX A6000, CUDA 13.0 driver, cuDNN 9.18
Verify
Contributing
Issues and PRs welcome at GitHub.
License
MIT