Skip to main content

Crate cqs

Crate cqs 

Source
Expand description

§cqs - Code Intelligence and RAG for AI Agents

Semantic search, call graph analysis, impact tracing, type dependencies, and smart context assembly — all in single tool calls. Local ML embeddings, GPU-accelerated.

§Features

  • Semantic search: Hybrid RRF (keyword + vector) with configurable embedding models (BGE-large default, E5-base and v9-200k presets, custom ONNX). 90.9% Recall@1 on 296-query expanded eval.
  • Call graphs: Callers, callees, transitive impact, shortest-path tracing between functions
  • Impact analysis: What breaks if you change X? Callers + affected tests + risk scoring
  • Type dependencies: Who uses this type? What types does this function use?
  • Smart context assembly: gather (search + BFS expansion), task (scout + gather + impact + placement), scout (pre-investigation dashboard)
  • Diff review & CI: Structured risk analysis, dead code detection in diffs, gating pipeline
  • Batch & chat modes: Persistent session with pipeline syntax (search "error" | callers | test-map)
  • Notes with sentiment: Unified memory system for AI collaborators
  • Multi-language: 52 languages + L5X/L5K PLC exports, with multi-grammar injection (HTML→JS/CSS, Svelte, Vue, Razor, etc.)
  • Type-aware embeddings: Full signatures appended to NL descriptions for richer type discrimination
  • Doc comment generation: --improve-docs generates and writes doc comments to source files via LLM
  • HyDE query predictions: --hyde-queries generates synthetic search queries per function for improved recall
  • Training data generation: train-data command generates fine-tuning triplets from git history
  • GPU acceleration: CUDA/TensorRT with CPU fallback
  • Document conversion: PDF, HTML, CHM, Web Help → cleaned Markdown (optional convert feature)

§Quick Start

use cqs::{Embedder, Parser, Store};
use cqs::embedder::ModelConfig;
use cqs::store::SearchFilter;

// Initialize components
let parser = Parser::new()?;
let embedder = Embedder::new(ModelConfig::resolve(None, None))?;
let store = Store::open(std::path::Path::new(".cqs/index.db"))?;

// Parse and embed a file
let chunks = parser.parse_file(std::path::Path::new("src/main.rs"))?;
let embeddings = embedder.embed_documents(
    &chunks.iter().map(|c| c.content.as_str()).collect::<Vec<_>>()
)?;

// Search for similar code (hybrid RRF search)
let query_embedding = embedder.embed_query("parse configuration file")?;
let filter = SearchFilter {
    enable_rrf: true,
    query_text: "parse configuration file".to_string(),
    ..Default::default()
};
let results = store.search_filtered(&query_embedding, &filter, 5, 0.3)?;

Re-exports§

pub use drift::detect_drift;
pub use drift::DriftEntry;
pub use drift::DriftResult;
pub use audit::parse_duration;
pub use embedder::Embedder;
pub use embedder::Embedding;
pub use hnsw::HnswIndex;
pub use index::IndexResult;
pub use index::VectorIndex;
pub use note::parse_notes;
pub use note::path_matches_mention;
pub use note::rewrite_notes_file;
pub use note::NoteEntry;
pub use note::NoteError;
pub use note::NoteFile;
pub use note::NOTES_HEADER;
pub use parser::Chunk;
pub use parser::Parser;
pub use reranker::Reranker;
pub use store::ModelInfo;
pub use store::SearchFilter;
pub use store::Store;

Modules§

audit
Audit mode for excluding notes from search/read
ci
CI pipeline analysis — composable diff review + dead code + gate logic.
config
Configuration file support for cqs
convert
Document-to-Markdown conversion pipeline.
doc_writer
Doc comment generation and source file rewriting.
drift
Drift detection — find functions that changed semantically between snapshots
embedder
Embedding generation with ort + tokenizers
fts
FTS normalization and identifier tokenization.
health
Health check — codebase quality snapshot
hnsw
HNSW (Hierarchical Navigable Small World) index for fast vector search
index
Vector index trait for nearest neighbor search
language
Language registry for code parsing
llm
Claude API client for LLM-generated function summaries (SQ-6).
note
Note parsing and types
parser
Code parsing with tree-sitter
plan
Task planning with template classification.
reference
Reference index support for multi-index search
reranker
Cross-encoder re-ranking for second-pass scoring
store
SQLite storage for chunks, embeddings, and call graph data.
suggest
Suggest — auto-detect note-worthy patterns in the codebase
train_data

Structs§

CallContext
Call graph context for enriching NL descriptions.
CallerDetail
Direct caller with display-ready fields (call-site context + snippet). Named CallerDetail to distinguish from store::CallerInfo which has only basic fields (name, file, line). This struct adds call_line and snippet for impact analysis display.
ChangedFunction
A function identified as changed by a diff
CrossProjectResult
Search result from a specific project
DiffEntry
A single diff entry
DiffHunk
A single hunk from a unified diff — one changed region in one file
DiffImpactResult
Aggregated impact result from a diff
DiffImpactSummary
Summary counts for diff impact
DiffResult
Result of a semantic diff
DiffTestInfo
A test affected by diff changes, tracking which changed function leads to it
FileGroup
A file group in the scout result
FileSuggestion
Suggestion for where to place new code
FunctionHints
Lightweight caller + test coverage hints for a function.
FunctionRisk
Per-function risk assessment from impact analysis.
GatherOptions
Options for gather operation
GatherResult
Result of a gather operation
GatheredChunk
A gathered code chunk with context
ImpactOptions
Options for impact analysis.
ImpactResult
Complete impact analysis result
JsDocInfo
JSDoc tag information extracted from documentation comments.
LocalPatterns
Local code patterns extracted from existing chunks in the target file/module. Uses String fields intentionally rather than an enum — this keeps the design flexible for arbitrary language-specific patterns without requiring type changes when adding new conventions. Adding a new naming convention or error handling style is a single function change in detect_naming_convention() or extract_patterns().
OnboardEntry
A code entry in the reading list.
OnboardResult
Result of an onboard analysis — ordered reading list for understanding a concept.
OnboardSummary
Summary statistics for the onboard result.
PlacementOptions
Options for customizing placement suggestion behavior.
PlacementResult
Result from placement analysis
ProjectEntry
A registered project
ProjectRegistry
Global registry of indexed cqs projects
RelatedFunction
A function related to the target with overlap count.
RelatedResult
Result of co-occurrence analysis for a target function.
ResolvedTarget
Result of resolving a target name to a concrete chunk. Contains the best-matching chunk and any alternative matches found during resolution (useful for disambiguation UIs).
ReviewNoteEntry
A note relevant to the review. Named ReviewNoteEntry to avoid collision with note::NoteEntry (parsed note from TOML) which is a different type.
ReviewResult
Result of a comprehensive diff review.
RiskScore
Risk assessment for a single function.
ScoutChunk
A chunk in the scout result with hints
ScoutOptions
Options for customizing scout behavior.
ScoutResult
Complete scout result
ScoutSummary
Summary counts
TaskResult
Complete task analysis result.
TaskSummary
Summary statistics for a task result.
TestEntry
Test that exercises the entry point.
TestInfo
Affected test with call depth
TestMatch
A test function that reaches the target through the call graph.
TestSuggestion
A suggested test for an untested caller
TransitiveCaller
Transitive caller at a given depth
TypeImpacted
A function impacted via shared type dependencies (one-hop type expansion).
TypeInfo
Type dependency of the entry point.

Enums§

AnalysisError
Unified error type for analysis operations (scout, where-to-add, etc.)
ChunkRole
Role classification for chunks in scout results
GatherDirection
Direction of call graph expansion
NlTemplate
Template variants for NL description generation.
Pattern
Known structural patterns
ProjectError
Typed error for project registry operations (EH-13).
RiskLevel
Risk level for a function based on caller count and test coverage.

Constants§

DEFAULT_MAX_EXPANDED_NODES
Default maximum nodes in BFS expansion to prevent blowup on hub functions.
DEFAULT_MAX_TEST_SEARCH_DEPTH
Default maximum depth for test search BFS. Exposed via max_test_depth parameters on analysis functions.
DEFAULT_ONBOARD_DEPTH
Default callee BFS expansion depth.
DEFAULT_PLACEMENT_SEARCH_LIMIT
Default search result limit for placement suggestions.
DEFAULT_PLACEMENT_SEARCH_THRESHOLD
Default minimum search score threshold for placement suggestions.
DEFAULT_SCOUT_SEARCH_LIMIT
Default number of search results for scout.
DEFAULT_SCOUT_SEARCH_THRESHOLD
Default minimum search score threshold for scout.
EMBEDDING_DIM
Default embedding dimension (1024, BGE-large-en-v1.5). The actual dimension is detected at runtime from the model output. Use Embedder::embedding_dim() for the runtime value. Derived from ModelConfig::default_model().dim.
INDEX_DIR
Name of the per-project index directory (created by cqs init).

Statics§

COMMON_TYPES
Standard library types to exclude from type-edge analysis.

Functions§

analyze_diff_impact
Run impact analysis across all changed functions from a diff. Fetches call graph and test chunks once, then analyzes each function. Results are deduplicated by name.
analyze_diff_impact_with_graph
Like analyze_diff_impact but accepts pre-loaded graph and test chunks. Paths in the returned result are relative to root. Use when the caller already has the graph/test_chunks (e.g., review_diff which also needs them for risk scoring).
analyze_impact
Run impact analysis: find callers, affected tests, and transitive callers. Paths in the returned result are relative to root. When opts.include_types is true, also performs one-hop type expansion: finds other functions that share type dependencies with the target via type_edges.
compute_hints
Compute caller count and test count for a single function. Convenience wrapper that loads graph internally. Pass prefetched_caller_count to avoid re-querying callers when the caller already has them (e.g., explain fetches callers before this).
compute_hints_batch
Batch compute hints for multiple functions using forward BFS (PERF-20). Single test_reachability call replaces N independent reverse_bfs calls.
compute_hints_with_graph
Core implementation — accepts pre-loaded graph and test chunks. Use this when processing multiple functions to avoid loading the graph N times (e.g., scout, which processes 10+ functions).
compute_risk_and_tests
Compute risk scores and collect deduplicated tests in a single pass. Shares BFS results across risk scoring and test collection, avoiding the duplicate reverse_bfs that occurs when calling compute_risk_batch and find_affected_tests_with_chunks separately.
compute_risk_batch
Compute risk scores for a batch of function names. Uses pre-loaded call graph and test chunks to avoid repeated queries. Formula: score = caller_count * (1.0 - test_ratio) where test_ratio = min(test_count / max(caller_count, 1), 1.0). Entry-point handling: functions with 0 callers and 0 tests get Medium risk (likely entry points that should have tests). PERF-24: Uses a single forward BFS from all test nodes to build a reachability map, instead of N independent reverse_bfs calls.
diff_impact_to_json
Serialize diff impact result to JSON.
enumerate_files
Enumerate files to index in a project directory.
extract_body_keywords
Extract meaningful keywords from function body, filtering language noise. Returns up to 10 unique keywords sorted by frequency (descending).
extract_modify_targets
Extract modify target names from scout results.
find_hotspots
Find the most-called functions in the codebase (hotspots). Returns [Hotspot] entries sorted by caller count descending.
find_related
Find functions related to target_name by co-occurrence. Three dimensions:
find_test_matches
Find test functions that can reach target_name through the call graph via reverse BFS, up to max_depth hops.
format_test_suggestions
Format test suggestions as JSON values.
gather
Gather relevant code chunks for a query.
gather_cross_index
Cross-index gather: seed from a reference index, bridge into project code, BFS expand.
gather_cross_index_with_index
Like gather_cross_index but accepts an optional HNSW index for O(log n) bridge searches instead of brute-force scans per reference seed.
gather_with_graph
Like gather but accepts a pre-loaded call graph.
generate_nl_description
Generate natural language description from chunk metadata.
generate_nl_with_call_context
Generate NL description enriched with call graph context.
generate_nl_with_call_context_and_summary
Generate NL with call context and optional LLM summary (SQ-6).
generate_nl_with_template
impact_to_json
Serialize impact result to JSON.
impact_to_mermaid
Generate a mermaid diagram from impact result.
index_notes
Index notes into the database (store without embeddings)
is_test_chunk
Unified test-chunk detection heuristic.
map_hunks_to_functions
Map diff hunks to function names using the index. For each hunk, finds chunks whose line range overlaps the hunk’s range. Returns deduplicated function names.
normalize_for_fts
Normalize code text for FTS5 indexing. Splits identifiers on camelCase/snake_case boundaries and joins with spaces. Used to make code searchable with natural language queries. Output is capped at 16KB to prevent memory issues with pathological inputs.
normalize_path
Normalize a path to a string with forward slashes.
normalize_slashes
Normalize backslashes to forward slashes in a string path.
onboard
Produce a guided tour of a concept in the codebase.
onboard_to_json
Convert OnboardResult to JSON.
parse_jsdoc_tags
Parse JSDoc tags from a documentation comment. Extracts @param and @returns/@return tags from JSDoc-style comments.
parse_target
Parse a target string into (optional_file_filter, function_name). Supports formats:
parse_unified_diff
Parse unified diff output into hunks. Handles standard git diff output:
rel_display
Relativize a path against a root and normalize separators for display.
resolve_index_dir
Resolve the index directory for a project, migrating from .cq/ to .cqs/ if needed.
resolve_target
Resolve a target string to a ResolvedTarget. Uses search_by_name with optional file filtering. Returns the best-matching chunk and alternatives, or an error if none found.
review_diff
Analyze a unified diff and produce a comprehensive review. Steps:
scout
Run scout analysis for a task description.
scout_to_json
Serialize scout result to JSON.
scout_with_options
Run scout analysis with configurable search parameters.
search_across_projects
Search across all registered projects
semantic_diff
Run a semantic diff between two stores.
serialize_path_normalized
Serde serializer for PathBuf fields: forward-slash normalized.
strip_markdown_noise
Strip markdown formatting noise for cleaner embedding text. Removes heading prefixes, image syntax, simplifies links to just text, strips bold/italic markers, HTML tags, and collapses whitespace. Keeps inline code content (strips backticks but preserves text).
suggest_placement
Suggest where to place new code matching a description. Uses default search parameters. For custom parameters, use suggest_placement_with_options.
suggest_placement_with_options
Suggest where to place new code matching a description with configurable search parameters. If opts.query_embedding is set, reuses it (avoids redundant ONNX inference). Otherwise, computes the embedding from description using embedder.
suggest_tests
Suggest tests for untested callers in an impact result. Loads its own call graph and test chunks — only called when --suggest-tests is set, so the normal path pays zero overhead.
task
Produce complete implementation context for a task description.
task_to_json
Serialize task result to JSON.
task_with_resources
Like task but accepts pre-loaded call graph and test chunks.
temp_suffix
Generate an unpredictable u64 suffix for temporary file names.
tokenize_identifier
Split identifier on snake_case and camelCase boundaries. Note: This function splits on every uppercase letter, so acronyms like “XMLParser” become individual letters. This is intentional for search tokenization where “xml parser” is more useful than preserving “XML”.