seekr-code 0.1.0

A semantic code search engine, smarter than grep. Supports text regex + semantic vector + AST pattern search, 100% local.
Documentation
# seekr-code

A semantic code search engine, smarter than grep.

Supports **text regex** + **semantic vector** + **AST pattern** search — 100% local, no data leaves your machine.

[中文文档](README_CN.md)

## Features

- 🔍 **Text Search** — High-performance regex matching across code
- 🧠 **Semantic Search** — Local ONNX-based embedding + HNSW KNN search, find code by meaning
- 🌳 **AST Pattern Search** — Match function signatures, structs, classes via Tree-sitter (e.g., `fn(*) -> Result`)
-**Hybrid Mode** — Combine all three via Reciprocal Rank Fusion (RRF) for best results
- 📡 **MCP Server** — Model Context Protocol support for AI editor integration
- 🌐 **HTTP API** — REST API for integration with other tools
- 🔄 **Incremental Indexing** — Only re-process changed files
- 🗂️ **15 Languages** — Rust, Python, JavaScript, TypeScript, Go, Java, C, C++, Ruby, Bash, HTML, CSS, JSON, TOML, YAML

## Installation

### From crates.io

```bash
cargo install seekr-code
```

### From source

```bash
git clone https://github.com/lucientong/seekr.git
cd seekr
cargo install --path .
```

After installation, the `seekr-code` binary will be available in your `$PATH`.

### Requirements

- Rust 1.85.0 or later
- A C/C++ compiler (for building tree-sitter grammars)

## Quick Start

### 1. Build an index

```bash
# Index the current project
seekr-code index

# Index a specific project path
seekr-code index /path/to/project

# Force a full rebuild (ignore incremental state)
seekr-code index --force
```

### 2. Search code

```bash
# Hybrid search (default — combines text + semantic + AST)
seekr-code search "authenticate user"

# Text regex search
seekr-code search "fn.*authenticate" --mode text

# Semantic search (search by meaning)
seekr-code search "user login validation" --mode semantic

# AST pattern search
seekr-code search "fn(*) -> Result" --mode ast
seekr-code search "struct *Config" --mode ast
seekr-code search "async fn(*)" --mode ast
```

### 3. Check index status

```bash
seekr-code status
```

### 4. JSON output

All commands support `--json` for machine-readable output:

```bash
seekr-code search "authenticate" --json
seekr-code index --json
seekr-code status --json
```

## Server Mode

### HTTP API

```bash
# Start the HTTP API server (default: 127.0.0.1:7720)
seekr-code serve

# Custom host and port
seekr-code serve --host 0.0.0.0 --port 8080
```

**Endpoints:**

| Method | Path      | Description          |
|--------|-----------|----------------------|
| POST   | /search   | Search code          |
| POST   | /index    | Trigger index build  |
| GET    | /status   | Query index status   |
| GET    | /health   | Health check         |

**Example:**

```bash
curl -X POST http://127.0.0.1:7720/search \
  -H "Content-Type: application/json" \
  -d '{"query": "authenticate user", "mode": "hybrid", "top_k": 10}'
```

### MCP Server (AI Editor Integration)

```bash
# Start as MCP server over stdio
seekr-code serve --mcp
```

**MCP Tools:**

- `seekr_search` — Search code (text, semantic, AST, hybrid modes)
- `seekr_index` — Build/rebuild the search index
- `seekr_status` — Get index status

**Example MCP configuration** (e.g., for Claude Desktop, CodeBuddy, etc.):

```json
{
  "mcpServers": {
    "seekr-code": {
      "command": "seekr-code",
      "args": ["serve", "--mcp"]
    }
  }
}
```

## AST Pattern Syntax

```text
[async] [pub] fn [name]([param_types, ...]) [-> return_type]
class ClassName
struct StructName
enum EnumName
trait TraitName
```

**Examples:**

| Pattern                   | Matches                                    |
|---------------------------|--------------------------------------------|
| `fn(string) -> number`    | Functions taking a string, returning number |
| `fn(*) -> Result`         | Any function returning Result               |
| `async fn(*)`             | Any async function                          |
| `fn authenticate(*)`      | Functions named "authenticate"              |
| `struct *Config`           | Structs ending with "Config"               |
| `class *Service`           | Classes ending with "Service"              |
| `enum *Error`              | Enums ending with "Error"                  |

## Configuration

Configuration file: `~/.seekr/config.toml`

```toml
# Index storage directory
index_dir = "~/.seekr/indexes"

# ONNX model directory
model_dir = "~/.seekr/models"

# Embedding model name
embed_model = "all-MiniLM-L6-v2"

# Maximum file size to index (bytes)
max_file_size = 10485760

[server]
host = "127.0.0.1"
port = 7720

[search]
context_lines = 2
top_k = 20
rrf_k = 60

[embedding]
batch_size = 32
```

## How It Works

1. **Scanner** — Walks the project directory, respects `.gitignore`, filters by file type/size
2. **Parser** — Uses Tree-sitter to parse source files into semantic code chunks (functions, classes, structs, etc.)
3. **Embedder** — Generates vector embeddings using ONNX Runtime + all-MiniLM-L6-v2
4. **Index** — Builds inverted text index + HNSW vector index, persisted to disk
5. **Search** — Text regex, semantic KNN, AST pattern matching, fused via RRF

## Environment Variables

| Variable    | Description                                       |
|-------------|---------------------------------------------------|
| `SEEKR_LOG` | Log level filter (e.g., `seekr_code=debug`)       |
| `RUST_LOG`  | Fallback log level if `SEEKR_LOG` is not set      |

## License

[Apache License 2.0](LICENSE)

## Author

[lucientong](https://github.com/lucientong)