colgrep 0.9.4

Semantic code search powered by ColBERT
Documentation
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
<div align="center">
  <h1>ColGREP</h1>
</div>

Semantic code search powered by ColBERT multi-vector embeddings and the PLAID algorithm.

## Installation

### Pre-built Binaries (Recommended)

**macOS / Linux:**

```bash
curl --proto '=https' --tlsv1.2 -LsSf https://github.com/lightonai/lategrep/releases/latest/download/colgrep-installer.sh | sh
```

**Windows (PowerShell):**

```powershell
powershell -c "irm https://github.com/lightonai/lategrep/releases/latest/download/colgrep-installer.ps1 | iex"
```

**Specific Version:**

```bash
# Replace 0.4.0 with desired version
curl --proto '=https' --tlsv1.2 -LsSf https://github.com/lightonai/lategrep/releases/download/0.4.0/colgrep-installer.sh | sh
```

### Using Cargo (crates.io)

If you don't have Cargo installed, install Rust first:

```bash
curl --proto '=https' --tlsv1.2 -sSf https://sh.rustup.rs | sh
```

Then install colgrep:

```bash
cargo install colgrep
```

### Build from Source

```bash
git clone https://github.com/lightonai/lategrep.git
cd lategrep
cargo install --path colgrep
```

#### Build Features

| Feature      | Platform      | Description                            |
| ------------ | ------------- | -------------------------------------- |
| `accelerate` | macOS         | Apple Accelerate for vector operations |
| `coreml`     | macOS         | Apple CoreML for model inference       |
| `openblas`   | Linux         | OpenBLAS for vector operations         |
| `cuda`       | Linux/Windows | NVIDIA CUDA for model inference        |
| `tensorrt`   | Linux         | NVIDIA TensorRT for model inference    |
| `directml`   | Windows       | DirectML for model inference           |

**Examples:**

```bash
# macOS with Apple Accelerate + CoreML (recommended for M1/M2/M3/M4)
cargo install --path colgrep --features "accelerate,coreml"

# Linux with OpenBLAS
cargo install --path colgrep --features openblas

# Linux with CUDA GPU support
cargo install --path colgrep --features cuda

# Combine features
cargo install --path colgrep --features "openblas,cuda"
```

#### OpenBLAS Acceleration (Linux)

OpenBLAS provides optimized BLAS (Basic Linear Algebra Subprograms) for vector operations, significantly improving search performance on Linux.

**Install OpenBLAS:**

```bash
# Debian/Ubuntu
sudo apt install libopenblas-dev

# Fedora/RHEL/CentOS
sudo dnf install openblas-devel

# Arch Linux
sudo pacman -S openblas
```

Then build with the `openblas` feature:

```bash
cargo install --path colgrep --features openblas
```

#### Apple Accelerate + CoreML (macOS)

Apple Accelerate (vector operations) and CoreML (model inference) are built into macOS and require no additional installation. Just build with both features:

```bash
cargo install --path colgrep --features "accelerate,coreml"
```

This is recommended for M1/M2/M3/M4 Macs for optimal performance.

### ONNX Runtime (Automatic)

ONNX Runtime is automatically downloaded on first use. No manual installation required.

The CLI searches for ONNX Runtime in:

1. `ORT_DYLIB_PATH` environment variable
2. Python environments (pip/conda/venv)
3. System paths

If not found, it downloads to `~/.cache/onnxruntime/`.

---

## Quick Start

```bash
# Search with natural language (auto-indexes on first run)
colgrep "error handling in API"

# Search in specific directory
colgrep "database connection" /path/to/project

# Limit results
colgrep "authentication" -k 5
```

---

## Search Patterns

### Basic Search

```bash
colgrep "natural language query"
colgrep "function that parses JSON" ./src
colgrep "error handling" -k 10
```

### File Type Filtering (`--include`)

Filter by file extension or path pattern:

```bash
# Single extension
colgrep --include="*.py" "database query"
colgrep --include="*.rs" "error handling"

# Multiple extensions
colgrep --include="*.ts" --include="*.tsx" "React component"

# Path patterns
colgrep --include="src/**/*.rs" "config parsing"
colgrep --include="**/tests/**" "test helper"
colgrep --include="*_test.go" "mock"
```

**Pattern Syntax:**

| Pattern       | Matches                         |
| ------------- | ------------------------------- |
| `*.py`        | All Python files                |
| `*.{ts,tsx}`  | TypeScript and TSX files        |
| `src/**/*.rs` | Rust files under `src/`         |
| `**/tests/**` | Files in any `tests/` directory |
| `*_test.go`   | Go test files                   |
| `*.spec.ts`   | TypeScript spec files           |

### Hybrid Search (`-e` pattern + semantic)

First filter with grep, then rank semantically:

```bash
# Find files with "TODO", rank by semantic relevance to "error handling"
colgrep -e "TODO" "error handling"

# Find async functions, rank by "promise handling"
colgrep -e "async fn" "promise handling" --include="*.rs"

# Extended regex (-E)
colgrep -e "fn|struct" -E "rust definitions"
colgrep -e "error[0-9]+" -E "error codes"
colgrep -e "(get|set)Value" -E "accessor methods"

# Fixed string (-F) - no regex interpretation
colgrep -e "user.name" -F "user properties"

# Whole word (-w)
colgrep -e "error" -w "error handling"
```

### Output Modes

```bash
# List files only (like grep -l)
colgrep -l "database queries"

# Show full function content
colgrep -c "authentication handler"
colgrep --content "parse config" -k 5

# JSON output
colgrep --json "authentication" | jq '.[] | .unit.file'

# Control context lines (default: 6)
colgrep -n 10 "database"
```

### Exclusions

```bash
# Exclude file patterns
colgrep --exclude="*.test.ts" "component"
colgrep --exclude="*_mock.go" "service"

# Exclude directories (literal names)
colgrep --exclude-dir="vendor" "import"
colgrep --exclude-dir="node_modules" --exclude-dir="dist" "config"

# Exclude directories (glob patterns)
colgrep --exclude-dir="*/plugins" "query"
colgrep --exclude-dir="**/test_*" "implementation"
colgrep --exclude-dir=".claude/*" "code"
```

### Code-Only Mode

Skip documentation and config files:

```bash
colgrep --code-only "authentication logic"
```

Excludes: Markdown, text, YAML, TOML, JSON, Dockerfile, Makefile, shell scripts.

---

## Index Management

```bash
# Check index status
colgrep status

# Clear current project index
colgrep clear

# Clear all indexes
colgrep clear --all

# Show statistics
colgrep --stats
```

### Index Locations

| Platform | Location                                         |
| -------- | ------------------------------------------------ |
| Linux    | `~/.local/share/colgrep/indices/`                |
| macOS    | `~/Library/Application Support/colgrep/indices/` |
| Windows  | `%APPDATA%\colgrep\indices\`                     |

---

## Configuration

```bash
# Show current config
colgrep settings

# Set default results count
colgrep settings --k 20

# Use INT8 quantized model (default, faster)
colgrep settings --int8

# Use FP32 full precision (more accurate)
colgrep settings --fp32

# Reset to defaults (INT8, pool-factor 2)
colgrep settings --k 0 --n 0
```

### Change Model

```bash
# Temporary (single query)
colgrep "query" --model lightonai/another-model

# Permanent
colgrep set-model lightonai/another-model
```

Config stored in `~/.config/colgrep/config.json`.

---

## IDE Integrations

### Claude Code

```bash
colgrep --install-claude-code
```

**IMPORTANT: You must restart Claude Code after installation for the plugin to take effect.**

The integration includes intelligent hooks that:

- Check if your project needs indexing before injecting colgrep context
- Return empty if >3000 chunks need indexing (prevents slow operations on large projects)
- Return empty if index is desynced (needs repair)
- Only suggest colgrep when it's ready to use

This ensures Claude Code only uses colgrep when it will provide fast, reliable results.

To uninstall:

```bash
colgrep --uninstall-claude-code
```

### OpenCode

```bash
colgrep --install-opencode
colgrep --uninstall-opencode
```

### Codex

```bash
colgrep --install-codex
colgrep --uninstall-codex
```

### Complete Uninstall

Remove colgrep from all AI tools, clear all indexes, and delete all data:

```bash
colgrep --uninstall
```

---

## Supported Languages

**Code (18 languages):** Python, TypeScript, JavaScript, Go, Rust, Java, C, C++, C#, Ruby, Kotlin, Swift, Scala, PHP, Lua, Elixir, Haskell, OCaml

**Config:** YAML, TOML, JSON, Dockerfile, Makefile

**Text:** Markdown, Plain text, AsciiDoc, Org

**Shell:** Bash, Zsh, PowerShell

---

## How It Works

1. **Parse**: Tree-sitter extracts functions, methods, classes
2. **Analyze**: 5-layer analysis (AST, call graph, control flow, data flow, dependencies)
3. **Embed**: ColBERT encodes each unit as multiple vectors
4. **Index**: PLAID algorithm compresses and indexes vectors
5. **Search**: Query encoded and matched using late interaction scoring

### Embedding Input Format

Each code unit is converted to a structured text representation before being encoded by the model. Here's an example of what the model receives for a Python function:

**Original code:**

```python
# src/utils/http_client.py
def fetch_with_retry(url: str, max_retries: int = 3) -> Response:
    """Fetches data from a URL with retry logic."""
    for i in range(max_retries):
        try:
            return client.get(url)
        except RequestError as e:
            if i == max_retries - 1:
                raise e
```

**Model input:**

```
Function: fetch_with_retry
Signature: def fetch_with_retry(url: str, max_retries: int = 3) -> Response
Description: Fetches data from a URL with retry logic.
Parameters: url, max_retries
Returns: Response
Calls: range, client.get
Variables: i, e
Uses: client, RequestError
Code:
def fetch_with_retry(url: str, max_retries: int = 3) -> Response:
    """Fetches data from a URL with retry logic."""
    for i in range(max_retries):
        try:
            return client.get(url)
        except RequestError as e:
            if i == max_retries - 1:
                raise e
File: src / utils / http client http_client.py
```

The file path is normalized to improve semantic matching: separators become spaced, underscores/hyphens become spaces, and CamelCase is split (e.g., `HttpClient` → `http client`).

---

## Environment Variables

| Variable          | Description                  |
| ----------------- | ---------------------------- |
| `ORT_DYLIB_PATH`  | Path to ONNX Runtime library |
| `XDG_DATA_HOME`   | Override data directory      |
| `XDG_CONFIG_HOME` | Override config directory    |

---

## License

Apache-2.0

## See Also

- [llm-tldr]https://github.com/parcadei/llm-tldr
- [mgrep]https://github.com/mixedbread-ai/mgrep
- [cgrep]https://github.com/awgn/cgrep