greppy-cli 1.3.0

Sub-millisecond semantic code search and trace with AI reranking (Claude/Gemini/Ollama)
Documentation
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
# Greppy

```
 ██████╗ ██████╗ ███████╗██████╗ ██████╗ ██╗   ██╗
██╔════╝ ██╔══██╗██╔════╝██╔══██╗██╔══██╗╚██╗ ██╔╝
██║  ███╗██████╔╝█████╗  ██████╔╝██████╔╝ ╚████╔╝ 
██║   ██║██╔══██╗██╔══╝  ██╔═══╝ ██╔═══╝   ╚██╔╝  
╚██████╔╝██║  ██║███████╗██║     ██║        ██║   
 ╚═════╝ ╚═╝  ╚═╝╚══════╝╚═╝     ╚═╝        ╚═╝   
```

**Sub-millisecond semantic code search and invocation tracing with AI-powered reranking.**

No cloud indexing. Works with **Ollama (local)**, Claude, or Gemini. Just `greppy search "query"` or `greppy trace symbol`.

---

## What is Greppy?

Greppy is a local code search tool that combines:

- **BM25 full-text search** via [Tantivy]https://github.com/quickwit-oss/tantivy for sub-millisecond queries
- **AI reranking** via Ollama (local), Claude, or Gemini to surface the most relevant results
- **Background daemon** with file watching for instant, always-up-to-date searches

### Why Greppy?

AI coding tools (Claude Code, Cursor, Aider, OpenCode) need fast code search. Existing solutions are either:

- **Too slow** - grep/ripgrep scan files on every query
- **Cloud-dependent** - Sourcegraph, GitHub search require network
- **Not semantic** - keyword matching misses context

Greppy gives you **<1ms semantic search** that runs entirely on your machine.

---

## Installation

### macOS / Linux

```bash
curl -fsSL https://raw.githubusercontent.com/KBLCode/greppy/main/install.sh | bash
```

### Windows (PowerShell)

```powershell
irm https://raw.githubusercontent.com/KBLCode/greppy/main/install.ps1 | iex
```

### Cargo

```bash
cargo install greppy-cli
```

### From Source

```bash
git clone https://github.com/KBLCode/greppy
cd greppy
cargo install --path .
```

---

## Quick Start

```bash
# 1. Index your project (one-time setup)
cd your-project
greppy index

# 2. (Optional) Authenticate for AI-powered reranking
greppy login

# 3. Search!
greppy search "authentication middleware"
```

That's it! Greppy works immediately after indexing. Authentication is optional but recommended for better results.

---

## Search Modes

### Semantic Search (Default)

```bash
greppy search "error handling"
```

When configured with an AI provider, Greppy:
1. Runs a fast BM25 search to find candidate results
2. Sends candidates to AI (Ollama local, Claude, or Gemini) for reranking
3. Returns results ordered by semantic relevance

Without AI configured, automatically falls back to direct BM25 mode.

### Direct Search (BM25 Only)

```bash
greppy search -d "TODO"
greppy search --direct "FIXME"
```

Pure BM25 search without AI. Faster, but results are ranked by keyword frequency rather than semantic relevance.

### Search Options

```
Usage: greppy search [OPTIONS] <QUERY>

Options:
  -d, --direct             Direct mode (BM25 only, no AI)
  -n, --limit <N>          Maximum results (default: 20)
      --json               JSON output for scripting
  -p, --project <PATH>     Project path (default: current directory)
```

### Examples

```bash
# Find authentication code
greppy search "user authentication"

# Find all TODOs (direct mode, faster)
greppy search -d "TODO" -n 50

# JSON output for scripting
greppy search "database" --json | jq '.results[0].path'

# Search a specific project
greppy search "config" -p ~/projects/myapp
```

---

## Trace (Invocation Mapping)

Greppy Trace provides complete codebase invocation mapping - like Sentry's stack traces, but for your entire codebase without running code.

### Basic Trace

```bash
# Find all invocation paths for a symbol
greppy trace validateUser
```

Output:
```
╔══════════════════════════════════════════════════════════════════════════════╗
║  TRACE: validateUser                                                         ║
║  Defined: utils/validation.ts:8                                              ║
║  Found: 47 invocation paths from 12 entry points                             ║
╚══════════════════════════════════════════════════════════════════════════════╝

━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
Path 1/47                                              POST /api/auth/login
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━

  routes.ts:15          →  POST /api/auth/login
  auth.controller.ts:8  →  loginController.handle(req, res)
  auth.service.ts:42    →  authService.login(credentials)
  validation.ts:8       →  validateUser(user)  ← TARGET
```

### Trace Commands

```bash
# Call graph trace (who calls this function)
greppy trace <symbol>

# Direct mode (no AI, sub-millisecond)
greppy trace <symbol> -d

# Reference tracing with code context
greppy trace --refs userId              # All references
greppy trace --refs userId -c 2         # With 2 lines of context
greppy trace --refs userId --in src/    # Limit to src/ directory
greppy trace --refs userId --count      # Just show count
greppy trace --reads userId             # Reads only
greppy trace --writes userId            # Writes only

# Call graph analysis
greppy trace --callers fetchData        # What calls this symbol
greppy trace --callees fetchData        # What this symbol calls

# Type tracing (where does this type flow)
greppy trace --type User

# Module tracing (import/export relationships)
greppy trace --module utils/auth
greppy trace --cycles                   # Find circular dependencies

# Pattern tracing (find any pattern with regex)
greppy trace --pattern "TODO:.*"
greppy trace --pattern "async function" -c 2

# Data flow analysis
greppy trace --flow password            # Track data from source to sink

# Impact analysis (what breaks if I change this)
greppy trace --impact validateUser

# Dead code detection
greppy trace --dead
greppy trace --dead --xref             # With potential callers

# Codebase statistics
greppy trace --stats

# Scope analysis
greppy trace --scope src/api.ts:42      # What's visible at location

# Output formats
greppy trace <symbol> --json            # JSON for tooling
greppy trace <symbol> --plain           # No colors (for pipes)
greppy trace <symbol> --csv             # CSV for spreadsheets
greppy trace <symbol> --dot             # DOT for graph visualization
greppy trace <symbol> --markdown        # Markdown for documentation
```

### Composable Operations

Run multiple analyses in a single command:

```bash
# Run dead code + stats + cycles together
greppy trace --dead --stats --cycles

# Filter all operations to a path
greppy trace --dead --stats --in src/auth

# Summary mode: one-line output per operation
greppy trace --dead --stats --cycles --summary

# Combined JSON output for tooling
greppy trace --dead --stats --json
```

**Summary mode output:**
```
DEAD CODE ANALYSIS
  Dead symbols: 61  (unknown=4, function=16, struct=41)

CODEBASE STATISTICS
  Files: 5  Symbols: 84  Refs: 1711  Edges: 1688

CIRCULAR DEPENDENCIES
  Circular deps: 2
```

### Cross-Reference Dead Code

The `--xref` flag shows potential callers for dead symbols:

```bash
greppy trace --dead --xref -n 5
```

Output:
```
MessageRequest  src/ai/claude.rs:17  No references or calls found
    Potential callers:
      → new  src/ai/claude.rs:66  Same file - could call this
      → get_access_token  src/ai/claude.rs:75  Same file - could call this
      → MessageRequest  src/ai/claude.rs:143  Token match - name appears here
```

This helps you understand *why* code is dead - is it truly unused, or is there a missing call?

### What grep/ripgrep CAN'T do (but greppy can)

| Feature | grep/ripgrep | greppy |
|---------|--------------|--------|
| Impact analysis | No | `--impact` shows callers & affected entry points |
| Dead code detection | No | `--dead` finds unused symbols |
| Dead code cross-reference | No | `--dead --xref` shows potential callers |
| Call chain visualization | No | Shows full invocation paths |
| Semantic reference filtering | No | `--reads` vs `--writes` vs `--kind call` |
| Codebase statistics | No | `--stats` shows symbols, call depth, etc. |
| Circular dependency detection | No | `--cycles` finds import loops |
| Composable operations | No | `--dead --stats --cycles` runs all at once |
| Summary mode | No | `--summary` for condensed output |

---

## Authentication

Greppy uses OAuth to authenticate with AI providers. **No API keys needed!**

### Login

```bash
greppy login
```

1. Select your provider using arrow keys:
   - **Claude (Anthropic)** - Uses your Claude.ai account
   - **Gemini (Google)** - Uses your Google account

2. Complete the OAuth flow in your browser

3. You're ready to use semantic search!

### Logout

```bash
greppy logout
```

Removes all stored credentials from your system keychain.

### How It Works

- Tokens are stored securely in your system keychain (macOS Keychain, Windows Credential Manager, Linux Secret Service)
- Uses OAuth free tier - no API billing
- Without authentication, searches fall back to direct BM25 mode automatically

---

## Daemon

The background daemon provides sub-millisecond queries and automatic index updates.

### Commands

```bash
greppy start    # Start the daemon
greppy stop     # Stop the daemon
greppy status   # Check if daemon is running
```

### Features

- **In-memory indexes** - Queries return in <1ms
- **File watching** - Automatically updates indexes when files change
- **Query caching** - Repeated queries are instant

### Platform Support

| Platform | IPC Method |
|----------|------------|
| macOS    | Unix socket (`~/.greppy/daemon.sock`) |
| Linux    | Unix socket (`~/.greppy/daemon.sock`) |
| Windows  | TCP localhost (port in `~/.greppy/daemon.port`) |

---

## Indexing

### Basic Usage

```bash
# Index current directory
greppy index

# Index specific project
greppy index -p ~/projects/myapp

# Force full re-index
greppy index --force
```

### What Gets Indexed

Greppy automatically:
- Respects `.gitignore` patterns
- Chunks code into semantic units (functions, classes, methods)
- Extracts symbol names for boosted matching
- Skips binary files and common non-code directories

### Supported Languages

TypeScript, JavaScript, Python, Rust, Go, Java, Kotlin, Ruby, PHP, C, C++, C#, Swift, Elixir, Haskell, Lua, Shell, SQL, Vue, Svelte, HTML, CSS, JSON, YAML, Markdown, and more.

---

## Performance

### Search Performance

| Mode | Latency | Notes |
|------|---------|-------|
| Daemon (warm) | <1ms | Index in memory |
| Direct (warm) | 1-10ms | Index on disk |
| Direct (cold) | 50-100ms | First query loads index |
| Semantic (AI) | 500-2000ms | Includes AI reranking |

### Benchmark: greppy vs grep vs ripgrep

Tested on a 75k file, 13.7M line TypeScript codebase:

| Query: "userId" | Results | Time | Notes |
|-----------------|---------|------|-------|
| grep            | 2,648   | ~2.5s | Text matching (scans all files) |
| ripgrep         | 1,296   | ~0.04s | Text matching (parallel, faster) |
| **greppy**      | 990     | **~0.07s** | **Semantic refs** (knows symbol context) |

| Query: "useState" | Results | Time | Notes |
|-------------------|---------|------|-------|
| grep              | 1,449   | ~2.6s | Includes comments, strings |
| ripgrep           | 1,292   | ~0.04s | Includes comments, strings |
| **greppy**        | 1,258   | **~0.08s** | **Only actual symbol references** |

**Key difference:** grep/ripgrep find text matches. Greppy finds **semantic symbol references** - it knows when `userId` is a variable vs a string vs a comment.

### Trace Performance

| Query Type | Time | Notes |
|------------|------|-------|
| Symbol references | ~70ms | All usages of a symbol |
| Impact analysis | ~75ms | What breaks if you change this |
| Dead code detection | ~78ms | Find unused symbols |
| Codebase statistics | ~600ms | Full analysis |
| Call chain trace | <1ms | Pre-computed call graph |

### Token Usage: greppy vs AI Reading Files

When AI tools search code, they typically read entire files. Greppy returns only semantic references with targeted context, dramatically reducing token usage.

**Real test on 75k file codebase:**

| Query: "userId" (262 files contain it) | Tokens | Savings |
|----------------------------------------|--------|---------|
| AI reads 20 matching files | 43,493 | baseline |
| greppy --refs -c 2 (50 refs + context) | 3,100 | **93% less** |

| Query: "validateFounderAccess" | Tokens | Savings |
|--------------------------------|--------|---------|
| AI reads 4 matching files | 7,659 | baseline |
| greppy --refs -c 2 | 532 | **93% less** |
| greppy --impact | 170 | **98% less** |

**Cost savings at $3/1M tokens (Claude):**
- Reading 20 files: $0.13 per query
- Using greppy: $0.009 per query
- **14x cost reduction**

### System Performance

**Indexing speed:** ~17,000 chunks/second

**Memory usage:** ~55MB during indexing

---

## Configuration

Optional config at `~/.greppy/config.toml`:

```toml
[general]
default_limit = 20

[ignore]
patterns = ["node_modules", ".git", "dist", "build", "__pycache__"]

[index]
max_file_size = 1048576  # 1MB
max_files = 100000

[cache]
query_ttl = 60
max_queries = 1000
```

---

## Environment Variables

| Variable | Description |
|----------|-------------|
| `GREPPY_HOME` | Override config/data directory (default: `~/.greppy`) |
| `GREPPY_LOG` | Log level: `debug`, `info`, `warn`, `error` |

---

## How It Works

1. **Indexing** - Greppy walks your project, respecting `.gitignore`, and chunks code into semantic units (functions, classes, methods)

2. **Storage** - Chunks are stored in a [Tantivy]https://github.com/quickwit-oss/tantivy index with BM25 ranking

3. **Search** - Queries are parsed and matched against the index with symbol name boosting

4. **AI Reranking** - When authenticated, top BM25 results are sent to Claude or Gemini for semantic reranking

5. **Watching** - The daemon monitors file changes and incrementally updates indexes

---

## Integration with AI Tools

Greppy works great with AI coding assistants:

- **Claude Code** - Use as a code search tool
- **OpenCode** - Integrate via CLI
- **Cursor** - Call from terminal
- **Aider** - Use for codebase exploration
- **Custom MCP servers** - JSON output for easy parsing

### JSON Output

```bash
greppy search "auth" --json
```

```json
{
  "results": [
    {
      "path": "src/auth/login.rs",
      "content": "pub async fn login() -> Result<()> { ... }",
      "symbol_name": "login",
      "symbol_type": "method",
      "start_line": 1,
      "end_line": 50,
      "language": "rust",
      "score": 4.23
    }
  ],
  "query": "auth",
  "elapsed_ms": 0.8,
  "project": "/path/to/project"
}
```

---

## Troubleshooting

### "Not logged in" message

This is informational, not an error. Without authentication, Greppy uses direct BM25 search which still works great for most queries.

To enable AI reranking:
```bash
greppy login
```

### Daemon won't start

Check if another instance is running:
```bash
greppy status
greppy stop
greppy start
```

### Index seems outdated

Force a full re-index:
```bash
greppy index --force
```

Or start the daemon for automatic updates:
```bash
greppy start
```

### OAuth login fails

1. Make sure you have a browser available
2. Check your internet connection
3. Try logging out and back in:
   ```bash
   greppy logout
   greppy login
   ```

---

## Web UI

Greppy includes a visual web dashboard for codebase exploration.

### Launch

```bash
greppy web                    # Start on localhost:3000
greppy web --port 8080        # Custom port
greppy web --open             # Auto-open browser
```

### Features

- **Multiple Views** - Stats, Graph, List, Tree, Tables, Cycles, Timeline
- **Interactive Charts** - Matrix heatmap, Sankey flow, Force-directed graph
- **Live Updates** - Real-time sync when files change (via daemon)
- **Symbol Details** - Click any symbol to see callers, callees, refs
- **Dead Code Highlighting** - Instantly spot unused code
- **Cycle Detection** - Visualize circular dependencies

### Streamer Mode

For livestreamers and screen sharing, Greppy includes a **Streamer Mode** that hides sensitive paths:

1. Open Settings (gear icon)
2. Enable "Streamer Mode"
3. Configure hidden patterns (defaults: `.env*`, `*secret*`, `*credential*`, etc.)

When enabled:
- Sensitive file paths are replaced with `[HIDDEN]`
- Redaction happens server-side (not visible in network requests)
- Visual banner indicates streamer mode is active

### Views

| View | Description |
|------|-------------|
| **Stats** | Overview dashboard with charts |
| **Graph** | Force-directed dependency graph |
| **List** | Sortable/filterable symbol table |
| **Tree** | File tree with symbol counts |
| **Tables** | Matrix heatmap of file dependencies |
| **Cycles** | Circular dependency visualization |
| **Timeline** | Index history and snapshots |

---

## License

MIT

---

## Links

- **Repository:** https://github.com/KBLCode/greppy
- **Issues:** https://github.com/KBLCode/greppy/issues
- **Releases:** https://github.com/KBLCode/greppy/releases