codelens-engine 1.9.39

Harness-native Rust MCP server for code intelligence — 107 tools, 25 languages, tree-sitter + hybrid semantic search, 6.1x fewer tokens than rg+cat on agent tasks
Documentation

codelens-engine

Core code intelligence engine for CodeLens MCP.

Pure Rust library that treats a source repository as an indexed, queryable dataset. Parses 25 languages with tree-sitter, builds import/call/type graphs, and optionally runs a local embedding index for semantic search — all behind synchronous library APIs that other crates consume directly.

Where it fits

┌────────────────────────────┐
│   codelens-mcp (binary)    │  ← MCP/JSON-RPC tool surface, mutation gates,
│   tool surface + transport │    analysis jobs, coordination, profiles
└──────────────┬─────────────┘
               ▼
┌────────────────────────────┐
│     codelens-engine        │  ← YOU ARE HERE. Pure library. No I/O policy,
│     analysis + storage     │    no MCP protocol, no transport.
└──────────────┬─────────────┘
               ▼
      tree-sitter · SQLite/FTS5 · sqlite-vec · ONNX runtime · optional LSP

codelens-engine is the data plane. codelens-mcp is the control plane. Keeping them separate lets harness/editor/TUI consumers import the engine without pulling in the MCP server, and lets the server evolve its tool contract without disturbing the analysis core.

What it provides

  • Symbol extraction — AST-based functions, classes, types, constants, parameters. Captures name_path for qualified lookups (e.g. Module::Class/method) and lines/columns for precise navigation.
  • Import graph — cross-file dependency tracking with blast-radius queries (what breaks if I delete this file?), dead code detection, and Tarjan SCC cycle detection.
  • Call graph — static caller/callee relationships per language.
  • Scope analysis — block-level resolution for rename safety and unused/shadowed identifier checks.
  • Type hierarchy — inheritance and trait implementation chains.
  • Hybrid retrieval — ranked context combining exact lexical hits, SQLite FTS5 results, and (opt-in) semantic cosine similarity.
  • Semantic index — opt-in ONNX-runtime embedding store backed by sqlite-vec. Ships a bundled CodeSearchNet INT8 model by default.
  • LSP bridge — opt-in type-aware references/diagnostics when an LSP server is installed, falls back to tree-sitter transparently.
  • Safe mutation primitives — rename, replace-symbol-body, extract/ inline/move refactors with dry-run preview.

Storage layout

Per-project state lives entirely under .codelens/ at the project root:

.codelens/
├── symbols.db            SQLite (+FTS5): symbols, references, imports, scope
├── vec.db                sqlite-vec index (opt-in, feature = "semantic")
├── graph.cache           pre-computed import/call graph
├── bridges.json          optional NL→code query bridges
├── memories/             operator notes, one file per topic
└── audit/                mutation and coordination audit log

Nothing here needs a daemon; the engine opens, queries, and closes the SQLite handles synchronously per request. The vector store uses the same SQLite connection via the sqlite-vec extension loaded at start.

Languages

Python, JavaScript, TypeScript, TSX, Go, Java, Kotlin, Rust, C, C++, PHP, Swift, Scala, Ruby, C#, Dart, Lua, Zig, Elixir, Haskell, OCaml, Erlang, R, Bash, Julia — plus Clojure/ClojureScript and HTML/CSS/TOML/ YAML config grammars.

Usage

[dependencies]
codelens-engine = "1.9"

# With semantic search (adds ~70 MB dependency footprint: fastembed + ort):
codelens-engine = { version = "1.9", features = ["semantic"] }
use codelens_engine::{ProjectRoot, SymbolIndex};

let project = ProjectRoot::discover(".")?;
let index = SymbolIndex::build(&project)?;
for sym in index.symbols_in_file("src/main.rs") {
    println!("{} {}@{}:{}", sym.kind, sym.name, sym.file, sym.line);
}

Feature flags

Feature Default Adds
semantic yes ONNX runtime + fastembed + sqlite-vec for embedding search
scip-backend no SCIP import + precise navigation overlay (enterprise)
model-bakeoff no Alternative embedding model harness (benchmark-only)

Design posture

  • Tree-sitter first. Every core flow works without LSP. LSP is opt-in for type-awareness.
  • Bounded outputs. Every query has a size cap; the library never returns unbounded file contents.
  • Incremental indexing. The file watcher re-parses only changed files; full rebuild is a last resort.
  • No surprise I/O. The engine touches only .codelens/ and files under the project root. No network calls by default.

License

Apache-2.0. See LICENSE.