omengrep 0.0.2

Semantic code search — multi-vector embeddings + BM25
docs.rs failed to build omengrep-0.0.2
Please check the build logs for more information.
See Builds for ideas on how to fix a failed build, or Metadata for how to configure docs.rs builds.
If you believe this is docs.rs' fault, open an issue.

omengrep (og)

Local semantic code search using embeddings and BM25.

cargo install --path .
og build ./src
og "authentication flow" ./src

What it does

omengrep extracts functions, classes, and methods from source files using tree-sitter, then indexes each block with both embeddings and BM25 keywords. Queries match against both indexes, so searching "error handling" finds errorHandler() and AppError — not just comments containing those words.

$ og build ./src
Found 69 files (0.0s)
Indexed 801 blocks from 69 files (10.8s)

$ og "error handling" ./src
src/cli/search.rs:42 function handle_search
  pub fn handle_search(args: &SearchArgs) -> Result<()> {

src/types.rs:15 enum SearchError
  pub enum SearchError {
      IndexNotFound,

2 results (0.27s)
Query grep finds omengrep finds
"error handling" Comments mentioning it errorHandler(), AppError
"authentication" Strings containing "auth" login(), verify_token()
"database" Config files, comments Connection, query(), Db

Use grep/ripgrep for exact strings. Use omengrep when you want implementations, not mentions.

Install

Requires Rust nightly toolchain.

git clone https://github.com/nijaru/omengrep && cd omengrep
cargo install --path .

The embedding model (~17MB) downloads automatically on first use.

Usage

og build [path]                # Build index (required first)
og "query" [path]              # Search
og file.rs#func_name           # Find code similar to a named block
og file.rs:42                  # Find code similar to a specific line
og status [path]               # Show index info
og list [path]                 # List all indexes under path
og clean [path]                # Delete index
og mcp                         # Start MCP server (JSON-RPC over stdio)

# Options
og -n 5 "error handling" .     # Limit to 5 results
og --json "auth" .             # JSON output
og -l "config" .               # List matching files only
og -t py,js "api" .            # Filter by file type
og --exclude "tests/*" "fn" .  # Exclude patterns
og --code-only "handler" .     # Skip docs (md, txt, rst)

Set OG_AUTO_BUILD=1 to build the index automatically on first search.

How it works

omengrep uses tree-sitter to parse source files into AST blocks (functions, classes, methods), then builds two indexes per block:

  1. Embedding index — per-token embeddings from a ColBERT-style model (LateOn-Code-edge, 17M params, INT8 ONNX). Stored as MuVERA compressed multi-vectors, searched with MaxSim reranking.
  2. BM25 index — keyword search with identifier-aware tokenization.

At search time, both indexes run in parallel and results merge by ID, keeping the higher score.

Runs locally on CPU. Search latency is 270-440ms.

Built on omendb.

Supported languages

Code (25 languages): Bash, C, C++, C#, CSS, Elixir, Go, HCL, HTML, Java, JavaScript, JSON, Kotlin, Lua, PHP, Python, Ruby, Rust, Swift, TOML, TypeScript, YAML, Zig

Text: Markdown, plain text (chunked by headers)

License

MIT