Expand description
§cqs - Code Intelligence and RAG for AI Agents
Semantic search, call graph analysis, impact tracing, type dependencies, and smart context assembly — all in single tool calls. Local ML embeddings, GPU-accelerated.
§Features
- Semantic search: Hybrid RRF (keyword + vector) with configurable embedding models (BGE-large default, E5-base and v9-200k presets, custom ONNX). 90.9% Recall@1 on 296-query expanded eval.
- Call graphs: Callers, callees, transitive impact, shortest-path tracing between functions
- Impact analysis: What breaks if you change X? Callers + affected tests + risk scoring
- Type dependencies: Who uses this type? What types does this function use?
- Smart context assembly:
gather(search + BFS expansion),task(scout + gather + impact + placement),scout(pre-investigation dashboard) - Diff review & CI: Structured risk analysis, dead code detection in diffs, gating pipeline
- Batch & chat modes: Persistent session with pipeline syntax (
search "error" | callers | test-map) - Notes with sentiment: Unified memory system for AI collaborators
- Multi-language: 52 languages + L5X/L5K PLC exports, with multi-grammar injection (HTML→JS/CSS, Svelte, Vue, Razor, etc.)
- Type-aware embeddings: Full signatures appended to NL descriptions for richer type discrimination
- Doc comment generation:
--improve-docsgenerates and writes doc comments to source files via LLM - HyDE query predictions:
--hyde-queriesgenerates synthetic search queries per function for improved recall - Training data generation:
train-datacommand generates fine-tuning triplets from git history - GPU acceleration: CUDA/TensorRT with CPU fallback
- Document conversion: PDF, HTML, CHM, Web Help → cleaned Markdown (optional
convertfeature)
§Quick Start
use cqs::{Embedder, Parser, Store};
use cqs::embedder::ModelConfig;
use cqs::store::SearchFilter;
// Initialize components
let parser = Parser::new()?;
let embedder = Embedder::new(ModelConfig::resolve(None, None))?;
let store = Store::open(std::path::Path::new(".cqs/index.db"))?;
// Parse and embed a file
let chunks = parser.parse_file(std::path::Path::new("src/main.rs"))?;
let embeddings = embedder.embed_documents(
&chunks.iter().map(|c| c.content.as_str()).collect::<Vec<_>>()
)?;
// Search for similar code (hybrid RRF search)
let query_embedding = embedder.embed_query("parse configuration file")?;
let filter = SearchFilter {
enable_rrf: true,
query_text: "parse configuration file".to_string(),
..Default::default()
};
let results = store.search_filtered(&query_embedding, &filter, 5, 0.3)?;Re-exports§
pub use drift::detect_drift;pub use drift::DriftEntry;pub use drift::DriftResult;pub use audit::parse_duration;pub use embedder::Embedder;pub use embedder::Embedding;pub use hnsw::HnswIndex;pub use index::IndexResult;pub use index::VectorIndex;pub use note::parse_notes;pub use note::path_matches_mention;pub use note::rewrite_notes_file;pub use note::NoteEntry;pub use note::NoteError;pub use note::NoteFile;pub use note::NOTES_HEADER;pub use parser::Chunk;pub use parser::Parser;pub use reranker::Reranker;pub use store::ModelInfo;pub use store::SearchFilter;pub use store::Store;
Modules§
- audit
- Audit mode for excluding notes from search/read
- ci
- CI pipeline analysis — composable diff review + dead code + gate logic.
- config
- Configuration file support for cqs
- convert
- Document-to-Markdown conversion pipeline.
- doc_
writer - Doc comment generation and source file rewriting.
- drift
- Drift detection — find functions that changed semantically between snapshots
- embedder
- Embedding generation with ort + tokenizers
- fts
- FTS normalization and identifier tokenization.
- health
- Health check — codebase quality snapshot
- hnsw
- HNSW (Hierarchical Navigable Small World) index for fast vector search
- index
- Vector index trait for nearest neighbor search
- language
- Language registry for code parsing
- llm
- Claude API client for LLM-generated function summaries (SQ-6).
- note
- Note parsing and types
- parser
- Code parsing with tree-sitter
- plan
- Task planning with template classification.
- reference
- Reference index support for multi-index search
- reranker
- Cross-encoder re-ranking for second-pass scoring
- store
- SQLite storage for chunks, embeddings, and call graph data.
- suggest
- Suggest — auto-detect note-worthy patterns in the codebase
- train_
data
Structs§
- Call
Context - Call graph context for enriching NL descriptions.
- Caller
Detail - Direct caller with display-ready fields (call-site context + snippet).
Named
CallerDetailto distinguish fromstore::CallerInfowhich has only basic fields (name, file, line). This struct addscall_lineandsnippetfor impact analysis display. - Changed
Function - A function identified as changed by a diff
- Cross
Project Result - Search result from a specific project
- Diff
Entry - A single diff entry
- Diff
Hunk - A single hunk from a unified diff — one changed region in one file
- Diff
Impact Result - Aggregated impact result from a diff
- Diff
Impact Summary - Summary counts for diff impact
- Diff
Result - Result of a semantic diff
- Diff
Test Info - A test affected by diff changes, tracking which changed function leads to it
- File
Group - A file group in the scout result
- File
Suggestion - Suggestion for where to place new code
- Function
Hints - Lightweight caller + test coverage hints for a function.
- Function
Risk - Per-function risk assessment from impact analysis.
- Gather
Options - Options for gather operation
- Gather
Result - Result of a gather operation
- Gathered
Chunk - A gathered code chunk with context
- Impact
Options - Options for impact analysis.
- Impact
Result - Complete impact analysis result
- JsDoc
Info - JSDoc tag information extracted from documentation comments.
- Local
Patterns - Local code patterns extracted from existing chunks in the target file/module.
Uses String fields intentionally rather than an enum — this keeps the design
flexible for arbitrary language-specific patterns without requiring type changes
when adding new conventions. Adding a new naming convention or error handling
style is a single function change in
detect_naming_convention()orextract_patterns(). - Onboard
Entry - A code entry in the reading list.
- Onboard
Result - Result of an onboard analysis — ordered reading list for understanding a concept.
- Onboard
Summary - Summary statistics for the onboard result.
- Placement
Options - Options for customizing placement suggestion behavior.
- Placement
Result - Result from placement analysis
- Project
Entry - A registered project
- Project
Registry - Global registry of indexed cqs projects
- Related
Function - A function related to the target with overlap count.
- Related
Result - Result of co-occurrence analysis for a target function.
- Resolved
Target - Result of resolving a target name to a concrete chunk. Contains the best-matching chunk and any alternative matches found during resolution (useful for disambiguation UIs).
- Review
Note Entry - A note relevant to the review.
Named
ReviewNoteEntryto avoid collision withnote::NoteEntry(parsed note from TOML) which is a different type. - Review
Result - Result of a comprehensive diff review.
- Risk
Score - Risk assessment for a single function.
- Scout
Chunk - A chunk in the scout result with hints
- Scout
Options - Options for customizing scout behavior.
- Scout
Result - Complete scout result
- Scout
Summary - Summary counts
- Task
Result - Complete task analysis result.
- Task
Summary - Summary statistics for a task result.
- Test
Entry - Test that exercises the entry point.
- Test
Info - Affected test with call depth
- Test
Match - A test function that reaches the target through the call graph.
- Test
Suggestion - A suggested test for an untested caller
- Transitive
Caller - Transitive caller at a given depth
- Type
Impacted - A function impacted via shared type dependencies (one-hop type expansion).
- Type
Info - Type dependency of the entry point.
Enums§
- Analysis
Error - Unified error type for analysis operations (scout, where-to-add, etc.)
- Chunk
Role - Role classification for chunks in scout results
- Gather
Direction - Direction of call graph expansion
- NlTemplate
- Template variants for NL description generation.
- Pattern
- Known structural patterns
- Project
Error - Typed error for project registry operations (EH-13).
- Risk
Level - Risk level for a function based on caller count and test coverage.
Constants§
- DEFAULT_
MAX_ EXPANDED_ NODES - Default maximum nodes in BFS expansion to prevent blowup on hub functions.
- DEFAULT_
MAX_ TEST_ SEARCH_ DEPTH - Default maximum depth for test search BFS.
Exposed via
max_test_depthparameters on analysis functions. - DEFAULT_
ONBOARD_ DEPTH - Default callee BFS expansion depth.
- DEFAULT_
PLACEMENT_ SEARCH_ LIMIT - Default search result limit for placement suggestions.
- DEFAULT_
PLACEMENT_ SEARCH_ THRESHOLD - Default minimum search score threshold for placement suggestions.
- DEFAULT_
SCOUT_ SEARCH_ LIMIT - Default number of search results for scout.
- DEFAULT_
SCOUT_ SEARCH_ THRESHOLD - Default minimum search score threshold for scout.
- EMBEDDING_
DIM - Default embedding dimension (1024, BGE-large-en-v1.5).
The actual dimension is detected at runtime from the model output.
Use
Embedder::embedding_dim()for the runtime value. Derived fromModelConfig::default_model().dim. - INDEX_
DIR - Name of the per-project index directory (created by
cqs init).
Statics§
- COMMON_
TYPES - Standard library types to exclude from type-edge analysis.
Functions§
- analyze_
diff_ impact - Run impact analysis across all changed functions from a diff. Fetches call graph and test chunks once, then analyzes each function. Results are deduplicated by name.
- analyze_
diff_ impact_ with_ graph - Like
analyze_diff_impactbut accepts pre-loaded graph and test chunks. Paths in the returned result are relative toroot. Use when the caller already has the graph/test_chunks (e.g.,review_diffwhich also needs them for risk scoring). - analyze_
impact - Run impact analysis: find callers, affected tests, and transitive callers.
Paths in the returned result are relative to
root. Whenopts.include_typesis true, also performs one-hop type expansion: finds other functions that share type dependencies with the target viatype_edges. - compute_
hints - Compute caller count and test count for a single function.
Convenience wrapper that loads graph internally. Pass
prefetched_caller_countto avoid re-querying callers when the caller already has them (e.g.,explainfetches callers before this). - compute_
hints_ batch - Batch compute hints for multiple functions using forward BFS (PERF-20).
Single
test_reachabilitycall replaces N independentreverse_bfscalls. - compute_
hints_ with_ graph - Core implementation — accepts pre-loaded graph and test chunks. Use this when processing multiple functions to avoid loading the graph N times (e.g., scout, which processes 10+ functions).
- compute_
risk_ and_ tests - Compute risk scores and collect deduplicated tests in a single pass.
Shares BFS results across risk scoring and test collection, avoiding the
duplicate
reverse_bfsthat occurs when callingcompute_risk_batchandfind_affected_tests_with_chunksseparately. - compute_
risk_ batch - Compute risk scores for a batch of function names.
Uses pre-loaded call graph and test chunks to avoid repeated queries.
Formula:
score = caller_count * (1.0 - test_ratio)wheretest_ratio = min(test_count / max(caller_count, 1), 1.0). Entry-point handling: functions with 0 callers and 0 tests getMediumrisk (likely entry points that should have tests). PERF-24: Uses a single forward BFS from all test nodes to build a reachability map, instead of N independent reverse_bfs calls. - diff_
impact_ to_ json - Serialize diff impact result to JSON.
- enumerate_
files - Enumerate files to index in a project directory.
- extract_
body_ keywords - Extract meaningful keywords from function body, filtering language noise. Returns up to 10 unique keywords sorted by frequency (descending).
- extract_
modify_ targets - Extract modify target names from scout results.
- find_
hotspots - Find the most-called functions in the codebase (hotspots).
Returns [
Hotspot] entries sorted by caller count descending. - find_
related - Find functions related to
target_nameby co-occurrence. Three dimensions: - find_
test_ matches - Find test functions that can reach
target_namethrough the call graph via reverse BFS, up tomax_depthhops. - format_
test_ suggestions - Format test suggestions as JSON values.
- gather
- Gather relevant code chunks for a query.
- gather_
cross_ index - Cross-index gather: seed from a reference index, bridge into project code, BFS expand.
- gather_
cross_ index_ with_ index - Like
gather_cross_indexbut accepts an optional HNSW index for O(log n) bridge searches instead of brute-force scans per reference seed. - gather_
with_ graph - Like
gatherbut accepts a pre-loaded call graph. - generate_
nl_ description - Generate natural language description from chunk metadata.
- generate_
nl_ with_ call_ context - Generate NL description enriched with call graph context.
- generate_
nl_ with_ call_ context_ and_ summary - Generate NL with call context and optional LLM summary (SQ-6).
- generate_
nl_ with_ template - impact_
to_ json - Serialize impact result to JSON.
- impact_
to_ mermaid - Generate a mermaid diagram from impact result.
- index_
notes - Index notes into the database (store without embeddings)
- is_
test_ chunk - Unified test-chunk detection heuristic.
- map_
hunks_ to_ functions - Map diff hunks to function names using the index. For each hunk, finds chunks whose line range overlaps the hunk’s range. Returns deduplicated function names.
- normalize_
for_ fts - Normalize code text for FTS5 indexing. Splits identifiers on camelCase/snake_case boundaries and joins with spaces. Used to make code searchable with natural language queries. Output is capped at 16KB to prevent memory issues with pathological inputs.
- normalize_
path - Normalize a path to a string with forward slashes.
- normalize_
slashes - Normalize backslashes to forward slashes in a string path.
- onboard
- Produce a guided tour of a concept in the codebase.
- onboard_
to_ json - Convert OnboardResult to JSON.
- parse_
jsdoc_ tags - Parse JSDoc tags from a documentation comment. Extracts @param and @returns/@return tags from JSDoc-style comments.
- parse_
target - Parse a target string into (optional_file_filter, function_name). Supports formats:
- parse_
unified_ diff - Parse unified diff output into hunks.
Handles standard
git diffoutput: - rel_
display - Relativize a path against a root and normalize separators for display.
- resolve_
index_ dir - Resolve the index directory for a project, migrating from
.cq/to.cqs/if needed. - resolve_
target - Resolve a target string to a
ResolvedTarget. Uses search_by_name with optional file filtering. Returns the best-matching chunk and alternatives, or an error if none found. - review_
diff - Analyze a unified diff and produce a comprehensive review. Steps:
- scout
- Run scout analysis for a task description.
- scout_
to_ json - Serialize scout result to JSON.
- scout_
with_ options - Run scout analysis with configurable search parameters.
- search_
across_ projects - Search across all registered projects
- semantic_
diff - Run a semantic diff between two stores.
- serialize_
path_ normalized - Serde serializer for
PathBuffields: forward-slash normalized. - strip_
markdown_ noise - Strip markdown formatting noise for cleaner embedding text. Removes heading prefixes, image syntax, simplifies links to just text, strips bold/italic markers, HTML tags, and collapses whitespace. Keeps inline code content (strips backticks but preserves text).
- suggest_
placement - Suggest where to place new code matching a description.
Uses default search parameters. For custom parameters, use
suggest_placement_with_options. - suggest_
placement_ with_ options - Suggest where to place new code matching a description with configurable search parameters.
If
opts.query_embeddingis set, reuses it (avoids redundant ONNX inference). Otherwise, computes the embedding fromdescriptionusingembedder. - suggest_
tests - Suggest tests for untested callers in an impact result.
Loads its own call graph and test chunks — only called when
--suggest-testsis set, so the normal path pays zero overhead. - task
- Produce complete implementation context for a task description.
- task_
to_ json - Serialize task result to JSON.
- task_
with_ resources - Like
taskbut accepts pre-loaded call graph and test chunks. - temp_
suffix - Generate an unpredictable u64 suffix for temporary file names.
- tokenize_
identifier - Split identifier on snake_case and camelCase boundaries. Note: This function splits on every uppercase letter, so acronyms like “XMLParser” become individual letters. This is intentional for search tokenization where “xml parser” is more useful than preserving “XML”.