Crate cqs

Expand description

§cqs - Code Intelligence and RAG for AI Agents

Semantic search, call graph analysis, impact tracing, type dependencies, and smart context assembly — all in single tool calls. Local ML embeddings, GPU-accelerated.

§Features

Semantic search: Hybrid RRF (keyword + vector) with configurable embedding models (BGE-large default, E5-base and v9-200k presets, custom ONNX). 90.9% Recall@1 on 296-query expanded eval.
Call graphs: Callers, callees, transitive impact, shortest-path tracing between functions
Impact analysis: What breaks if you change X? Callers + affected tests + risk scoring
Type dependencies: Who uses this type? What types does this function use?
Smart context assembly: gather (search + BFS expansion), task (scout + gather + impact + placement), scout (pre-investigation dashboard)
Diff review & CI: Structured risk analysis, dead code detection in diffs, gating pipeline
Batch & chat modes: Persistent session with pipeline syntax (search "error" | callers | test-map)
Notes with sentiment: Unified memory system for AI collaborators
Multi-language: 52 languages + L5X/L5K PLC exports, with multi-grammar injection (HTML→JS/CSS, Svelte, Vue, Razor, etc.)
Type-aware embeddings: Full signatures appended to NL descriptions for richer type discrimination
Doc comment generation: --improve-docs generates and writes doc comments to source files via LLM
HyDE query predictions: --hyde-queries generates synthetic search queries per function for improved recall
Training data generation: train-data command generates fine-tuning triplets from git history
GPU acceleration: CUDA/TensorRT with CPU fallback
Document conversion: PDF, HTML, CHM, Web Help → cleaned Markdown (optional convert feature)

§Quick Start

use cqs::{Embedder, Parser, Store};
use cqs::embedder::ModelConfig;
use cqs::store::SearchFilter;

// Initialize components
let parser = Parser::new()?;
let embedder = Embedder::new(ModelConfig::resolve(None, None))?;
let store = Store::open(std::path::Path::new(".cqs/index.db"))?;

// Parse and embed a file
let chunks = parser.parse_file(std::path::Path::new("src/main.rs"))?;
let embeddings = embedder.embed_documents(
    &chunks.iter().map(|c| c.content.as_str()).collect::<Vec<_>>()
)?;

// Search for similar code (hybrid RRF search)
let query_embedding = embedder.embed_query("parse configuration file")?;
let filter = SearchFilter {
    enable_rrf: true,
    query_text: "parse configuration file".to_string(),
    ..Default::default()
};
let results = store.search_filtered(&query_embedding, &filter, 5, 0.3)?;

Re-exports§

pub use drift::detect_drift;
pub use drift::DriftEntry;
pub use drift::DriftResult;
pub use audit::parse_duration;
pub use embedder::Embedder;
pub use embedder::Embedding;
pub use hnsw::HnswIndex;
pub use index::IndexResult;
pub use index::VectorIndex;
pub use note::parse_notes;
pub use note::path_matches_mention;
pub use note::rewrite_notes_file;
pub use note::NoteEntry;
pub use note::NoteError;
pub use note::NoteFile;
pub use note::NOTES_HEADER;
pub use parser::Chunk;
pub use parser::Parser;
pub use reranker::Reranker;
pub use store::ModelInfo;
pub use store::SearchFilter;
pub use store::Store;

Modules§

audit: Audit mode for excluding notes from search/read
ci: CI pipeline analysis — composable diff review + dead code + gate logic.
config: Configuration file support for cqs
convert: Document-to-Markdown conversion pipeline.
doc_writer: Doc comment generation and source file rewriting.
drift: Drift detection — find functions that changed semantically between snapshots
embedder: Embedding generation with ort + tokenizers
fts: FTS normalization and identifier tokenization.
health: Health check — codebase quality snapshot
hnsw: HNSW (Hierarchical Navigable Small World) index for fast vector search
index: Vector index trait for nearest neighbor search
language: Language registry for code parsing
llm: Claude API client for LLM-generated function summaries (SQ-6).
note: Note parsing and types
parser: Code parsing with tree-sitter
plan: Task planning with template classification.
reference: Reference index support for multi-index search
reranker: Cross-encoder re-ranking for second-pass scoring
store: SQLite storage for chunks, embeddings, and call graph data.
suggest: Suggest — auto-detect note-worthy patterns in the codebase
train_data

Structs§

CallContext: Call graph context for enriching NL descriptions.
CallerDetail: Direct caller with display-ready fields (call-site context + snippet). Named CallerDetail to distinguish from store::CallerInfo which has only basic fields (name, file, line). This struct adds call_line and snippet for impact analysis display.
ChangedFunction: A function identified as changed by a diff
CrossProjectResult: Search result from a specific project
DiffEntry: A single diff entry
DiffHunk: A single hunk from a unified diff — one changed region in one file
DiffImpactResult: Aggregated impact result from a diff
DiffImpactSummary: Summary counts for diff impact
DiffResult: Result of a semantic diff
DiffTestInfo: A test affected by diff changes, tracking which changed function leads to it
FileGroup: A file group in the scout result
FileSuggestion: Suggestion for where to place new code
FunctionHints: Lightweight caller + test coverage hints for a function.
FunctionRisk: Per-function risk assessment from impact analysis.
GatherOptions: Options for gather operation
GatherResult: Result of a gather operation
GatheredChunk: A gathered code chunk with context
ImpactOptions: Options for impact analysis.
ImpactResult: Complete impact analysis result
JsDocInfo: JSDoc tag information extracted from documentation comments.
LocalPatterns: Local code patterns extracted from existing chunks in the target file/module. Uses String fields intentionally rather than an enum — this keeps the design flexible for arbitrary language-specific patterns without requiring type changes when adding new conventions. Adding a new naming convention or error handling style is a single function change in detect_naming_convention() or extract_patterns().
OnboardEntry: A code entry in the reading list.
OnboardResult: Result of an onboard analysis — ordered reading list for understanding a concept.
OnboardSummary: Summary statistics for the onboard result.
PlacementOptions: Options for customizing placement suggestion behavior.
PlacementResult: Result from placement analysis
ProjectEntry: A registered project
ProjectRegistry: Global registry of indexed cqs projects
RelatedFunction: A function related to the target with overlap count.
RelatedResult: Result of co-occurrence analysis for a target function.
ResolvedTarget: Result of resolving a target name to a concrete chunk. Contains the best-matching chunk and any alternative matches found during resolution (useful for disambiguation UIs).
ReviewNoteEntry: A note relevant to the review. Named ReviewNoteEntry to avoid collision with note::NoteEntry (parsed note from TOML) which is a different type.
ReviewResult: Result of a comprehensive diff review.
RiskScore: Risk assessment for a single function.
ScoutChunk: A chunk in the scout result with hints
ScoutOptions: Options for customizing scout behavior.
ScoutResult: Complete scout result
ScoutSummary: Summary counts
TaskResult: Complete task analysis result.
TaskSummary: Summary statistics for a task result.
TestEntry: Test that exercises the entry point.
TestInfo: Affected test with call depth
TestMatch: A test function that reaches the target through the call graph.
TestSuggestion: A suggested test for an untested caller
TransitiveCaller: Transitive caller at a given depth
TypeImpacted: A function impacted via shared type dependencies (one-hop type expansion).
TypeInfo: Type dependency of the entry point.

Enums§

AnalysisError: Unified error type for analysis operations (scout, where-to-add, etc.)
ChunkRole: Role classification for chunks in scout results
GatherDirection: Direction of call graph expansion
NlTemplate: Template variants for NL description generation.
Pattern: Known structural patterns
ProjectError: Typed error for project registry operations (EH-13).
RiskLevel: Risk level for a function based on caller count and test coverage.

Constants§

DEFAULT_MAX_EXPANDED_NODES: Default maximum nodes in BFS expansion to prevent blowup on hub functions.
DEFAULT_MAX_TEST_SEARCH_DEPTH: Default maximum depth for test search BFS. Exposed via max_test_depth parameters on analysis functions.
DEFAULT_ONBOARD_DEPTH: Default callee BFS expansion depth.
DEFAULT_PLACEMENT_SEARCH_LIMIT: Default search result limit for placement suggestions.
DEFAULT_PLACEMENT_SEARCH_THRESHOLD: Default minimum search score threshold for placement suggestions.
DEFAULT_SCOUT_SEARCH_LIMIT: Default number of search results for scout.
DEFAULT_SCOUT_SEARCH_THRESHOLD: Default minimum search score threshold for scout.
EMBEDDING_DIM: Default embedding dimension (1024, BGE-large-en-v1.5). The actual dimension is detected at runtime from the model output. Use Embedder::embedding_dim() for the runtime value. Derived from ModelConfig::default_model().dim.
INDEX_DIR: Name of the per-project index directory (created by cqs init).

Statics§

COMMON_TYPES: Standard library types to exclude from type-edge analysis.

Functions§

analyze_diff_impact: Run impact analysis across all changed functions from a diff. Fetches call graph and test chunks once, then analyzes each function. Results are deduplicated by name.
analyze_diff_impact_with_graph: Like analyze_diff_impact but accepts pre-loaded graph and test chunks. Paths in the returned result are relative to root. Use when the caller already has the graph/test_chunks (e.g., review_diff which also needs them for risk scoring).
analyze_impact: Run impact analysis: find callers, affected tests, and transitive callers. Paths in the returned result are relative to root. When opts.include_types is true, also performs one-hop type expansion: finds other functions that share type dependencies with the target via type_edges.
compute_hints: Compute caller count and test count for a single function. Convenience wrapper that loads graph internally. Pass prefetched_caller_count to avoid re-querying callers when the caller already has them (e.g., explain fetches callers before this).
compute_hints_batch: Batch compute hints for multiple functions using forward BFS (PERF-20). Single test_reachability call replaces N independent reverse_bfs calls.
compute_hints_with_graph: Core implementation — accepts pre-loaded graph and test chunks. Use this when processing multiple functions to avoid loading the graph N times (e.g., scout, which processes 10+ functions).
compute_risk_and_tests: Compute risk scores and collect deduplicated tests in a single pass. Shares BFS results across risk scoring and test collection, avoiding the duplicate reverse_bfs that occurs when calling compute_risk_batch and find_affected_tests_with_chunks separately.
compute_risk_batch: Compute risk scores for a batch of function names. Uses pre-loaded call graph and test chunks to avoid repeated queries. Formula: score = caller_count * (1.0 - test_ratio) where test_ratio = min(test_count / max(caller_count, 1), 1.0). Entry-point handling: functions with 0 callers and 0 tests get Medium risk (likely entry points that should have tests). PERF-24: Uses a single forward BFS from all test nodes to build a reachability map, instead of N independent reverse_bfs calls.
diff_impact_to_json: Serialize diff impact result to JSON.
enumerate_files: Enumerate files to index in a project directory.
extract_body_keywords: Extract meaningful keywords from function body, filtering language noise. Returns up to 10 unique keywords sorted by frequency (descending).
extract_modify_targets: Extract modify target names from scout results.
find_hotspots: Find the most-called functions in the codebase (hotspots). Returns [Hotspot] entries sorted by caller count descending.
find_related: Find functions related to target_name by co-occurrence. Three dimensions:
find_test_matches: Find test functions that can reach target_name through the call graph via reverse BFS, up to max_depth hops.
format_test_suggestions: Format test suggestions as JSON values.
gather: Gather relevant code chunks for a query.
gather_cross_index: Cross-index gather: seed from a reference index, bridge into project code, BFS expand.
gather_cross_index_with_index: Like gather_cross_index but accepts an optional HNSW index for O(log n) bridge searches instead of brute-force scans per reference seed.
gather_with_graph: Like gather but accepts a pre-loaded call graph.
generate_nl_description: Generate natural language description from chunk metadata.
generate_nl_with_call_context: Generate NL description enriched with call graph context.
generate_nl_with_call_context_and_summary: Generate NL with call context and optional LLM summary (SQ-6).
generate_nl_with_template
impact_to_json: Serialize impact result to JSON.
impact_to_mermaid: Generate a mermaid diagram from impact result.
index_notes: Index notes into the database (store without embeddings)
is_test_chunk: Unified test-chunk detection heuristic.
map_hunks_to_functions: Map diff hunks to function names using the index. For each hunk, finds chunks whose line range overlaps the hunk’s range. Returns deduplicated function names.
normalize_for_fts: Normalize code text for FTS5 indexing. Splits identifiers on camelCase/snake_case boundaries and joins with spaces. Used to make code searchable with natural language queries. Output is capped at 16KB to prevent memory issues with pathological inputs.
normalize_path: Normalize a path to a string with forward slashes.
normalize_slashes: Normalize backslashes to forward slashes in a string path.
onboard: Produce a guided tour of a concept in the codebase.
onboard_to_json: Convert OnboardResult to JSON.
parse_jsdoc_tags: Parse JSDoc tags from a documentation comment. Extracts @param and @returns/@return tags from JSDoc-style comments.
parse_target: Parse a target string into (optional_file_filter, function_name). Supports formats:
parse_unified_diff: Parse unified diff output into hunks. Handles standard git diff output:
rel_display: Relativize a path against a root and normalize separators for display.
resolve_index_dir: Resolve the index directory for a project, migrating from .cq/ to .cqs/ if needed.
resolve_target: Resolve a target string to a ResolvedTarget. Uses search_by_name with optional file filtering. Returns the best-matching chunk and alternatives, or an error if none found.
review_diff: Analyze a unified diff and produce a comprehensive review. Steps:
scout: Run scout analysis for a task description.
scout_to_json: Serialize scout result to JSON.
scout_with_options: Run scout analysis with configurable search parameters.
search_across_projects: Search across all registered projects
semantic_diff: Run a semantic diff between two stores.
serialize_path_normalized: Serde serializer for PathBuf fields: forward-slash normalized.
strip_markdown_noise: Strip markdown formatting noise for cleaner embedding text. Removes heading prefixes, image syntax, simplifies links to just text, strips bold/italic markers, HTML tags, and collapses whitespace. Keeps inline code content (strips backticks but preserves text).
suggest_placement: Suggest where to place new code matching a description. Uses default search parameters. For custom parameters, use suggest_placement_with_options.
suggest_placement_with_options: Suggest where to place new code matching a description with configurable search parameters. If opts.query_embedding is set, reuses it (avoids redundant ONNX inference). Otherwise, computes the embedding from description using embedder.
suggest_tests: Suggest tests for untested callers in an impact result. Loads its own call graph and test chunks — only called when --suggest-tests is set, so the normal path pays zero overhead.
task: Produce complete implementation context for a task description.
task_to_json: Serialize task result to JSON.
task_with_resources: Like task but accepts pre-loaded call graph and test chunks.
temp_suffix: Generate an unpredictable u64 suffix for temporary file names.
tokenize_identifier: Split identifier on snake_case and camelCase boundaries. Note: This function splits on every uppercase letter, so acronyms like “XMLParser” become individual letters. This is intentional for search tokenization where “xml parser” is more useful than preserving “XML”.

Crate cqs

Crate cqs Copy item path

§cqs - Code Intelligence and RAG for AI Agents

§Features

§Quick Start

Re-exports§

Modules§

Structs§

Enums§

Constants§

Statics§

Functions§

Crate cqs