reflex-search 1.0.3

A local-first, structure-aware code search engine for AI agents
Documentation
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
# Reflex

**Local-first code search engine with full-text search, symbol extraction, and dependency analysis for AI coding workflows**

Reflex is a code search engine designed for developers and AI coding assistants. It combines trigram indexing for full-text search with Tree-sitter parsing for symbol extraction and static analysis for dependency tracking. Unlike symbol-only tools, Reflex finds **every occurrence** of patterns, function calls, variable usage, comments, and more with deterministic, repeatable results.

[![Build Status](https://img.shields.io/badge/build-passing-brightgreen)]()
[![Tests](https://img.shields.io/badge/tests-347%20passing-brightgreen)]()
[![License](https://img.shields.io/badge/license-MIT-blue)]()

## โœจ Features

- **๐Ÿ” Complete Coverage**: Find every occurrence, not just symbol definitions
- **โšก Fast Queries**: Trigram indexing with memory-mapped I/O for efficient search
- **๐ŸŽฏ Symbol-Aware**: Runtime tree-sitter parsing for precise symbol filtering
- **๐Ÿ–ฅ๏ธ Interactive Mode**: Live TUI for exploring code with instant search and filters
- **๐Ÿ”„ Incremental**: Only reindexes changed files (blake3 hashing)
- **๐ŸŒ Multi-Language**: Rust, TypeScript/JavaScript, Vue, Svelte, PHP, Python, Go, Java, C, C++, C#, Ruby, Kotlin, Zig
- **๐Ÿค– AI Query Assistant**: Natural language search with `rfx ask` (OpenAI, Anthropic, Groq)
- **๐Ÿ“ก MCP Support**: Model Context Protocol server for AI assistants
- **๐Ÿ“ฆ Local-First**: Fully offline, all data stays on your machine
- **๐ŸŽจ Regex Support**: Trigram-optimized regex search
- **๐ŸŒณ AST Queries**: Structure-aware search with Tree-sitter
- **๐Ÿ”’ Deterministic**: Same query โ†’ same results (no probabilistic ranking)

## ๐Ÿš€ Quick Start

### Installation

```bash
# Via NPM
npm install -g reflex-search

# Or via cargo
cargo install reflex-search
```

### Basic Usage

```bash
# Index your codebase
rfx index

# Full-text search (finds all occurrences)
rfx query "extract_symbols"

# Symbol-only search (definitions only)
rfx query "extract_symbols" --symbols

# Filter by language and symbol kind
rfx query "parse" --lang rust --kind function --symbols

# Include dependency information (imports)
rfx query "MyStruct" --dependencies

# Regex search
rfx query "fn.*test" --regex

# Paths-only mode (for piping to other tools)
vim $(rfx query "TODO" --paths)

# Export as JSON for AI agents
rfx query "unwrap" --json --limit 10
```

## ๐Ÿค– AI Query Assistant

Don't want to remember search syntax? Use `rfx ask` to translate natural language questions into `rfx query` commands.

### Setup

First-time setup requires configuring an AI provider (OpenAI, Anthropic, or Groq):

```bash
# Interactive configuration wizard (recommended)
rfx ask --configure
```

This will guide you through:
- Selecting an AI provider
- Entering your API key
- Choosing a model (optional)

Configuration is saved to `~/.reflex/config.toml`:

```toml
[semantic]
provider = "openai"  # or anthropic, groq

[credentials]
openai_api_key = "sk-..."
openai_model = "gpt-4o-mini"  # optional
```

### Usage

There are two ways to use `rfx ask`: 

1) Interactive mode

Interactive chat mode with conversation history. This mode uses `--agentic` and `--answer` under the hood.

```bash
rfx ask
```

2) CLI-only mode

One-shot, non-conversational commands that return results directly via CLI.

```bash
# Ask a question (generates and executes rfx query commands, only returns query results)
rfx ask "Find all TODOs in Rust files"

# Use a specific provider
rfx ask "Show me error handling code" --provider groq

# Agentic mode (multi-step reasoning with automatic context gathering)
rfx ask "How does authentication work?" --agentic

# Get a conversational answer based on search results
rfx ask "What does the indexer module do?" --answer
```

**How it works:**
1. Your natural language question is sent to an LLM
2. The LLM generates one or more `rfx query` commands
3. You review and confirm (or use `--execute` to auto-run)
4. Results are displayed as normal search output

**Agentic mode** (`--agentic`) enables multi-step reasoning where the LLM can:
- Gather context by running multiple searches
- Refine queries based on initial results
- Iteratively explore the codebase
- Generate comprehensive answers with `--answer`

## ๐Ÿ“‹ Command Reference

### `rfx index`

Build or update the search index.

```bash
rfx index [OPTIONS]

Options:
  --force              Force full reindex (ignore incremental)
  --languages <LANGS>  Limit to specific languages (comma-separated)

Subcommands:
  status               Show background symbol indexing status
  compact              Compact cache (remove deleted files, reclaim space)
```

### `rfx query`

Search the codebase with CLI or interactive TUI mode.

**Interactive Mode (TUI):**
```bash
# Launch interactive mode (no pattern required)
rfx query

# Features:
# - Live search with instant results
# - Toggle filters: symbols-only, regex, language
# - Navigate results with keyboard (j/k, arrows)
# - Open files in $EDITOR (press 'o')
# - Query history with Ctrl+P/Ctrl+N
# - Press '?' for help, 'q' to quit
```

**CLI Mode:**

Run `rfx query --help` for full options.

**Key Options:**
- `--symbols, -s` - Symbol-only search (definitions, not usage)
- `--regex, -r` - Treat pattern as regex
- `--lang <LANG>` - Filter by language
- `--kind <KIND>` - Filter by symbol kind (function, class, struct, etc.)
- `--dependencies` - Include dependency information (supports: Rust, TypeScript, JavaScript, Python, Go, Java, C, C++, C#, PHP, Ruby, Kotlin)
- `--paths, -p` - Return only file paths (no content)
- `--json` - Output as JSON
- `--limit <N>` - Limit number of results
- `--timeout <SECS>` - Query timeout (default: 30s)

**Examples:**
```bash
# Find function definitions named "parse"
rfx query "parse" --symbols --kind function

# Find test functions using regex
rfx query "fn test_\w+" --regex

# Search Rust files only
rfx query "unwrap" --lang rust

# Get paths of files with TODOs
rfx query "TODO" --paths

# Include import information
rfx query "Config" --symbols --dependencies
```

### `rfx mcp`

Start as an MCP (Model Context Protocol) server for AI coding assistants.

```json
{
  "mcpServers": {
    "reflex": {
      "command": "rfx",
      "args": ["mcp"],
      "env": {},
      "disabled": false
    }
  }
}
```

**Error Handling:**

If any MCP tool returns an error about a missing or stale index (e.g., "Index not found. Run 'rfx index' to build the cache first."), the AI agent should:

1. Call `index_project` to rebuild the index
2. Wait for indexing to complete
3. Retry the previously failed operation

This pattern ensures that queries always run against an up-to-date index.

**Available MCP Tools:**
1. **`list_locations`** - Fast location discovery (file + line only, minimal tokens)
2. **`count_occurrences`** - Quick statistics (total count + file count)
3. **`search_code`** - Full-text or symbol search with detailed results
4. **`search_regex`** - Regex pattern matching
5. **`search_ast`** - AST pattern matching (structure-aware, slow)
6. **`index_project`** - Trigger reindexing
7. **`get_dependencies`** - Get all dependencies of a specific file
8. **`get_dependents`** - Get all files that depend on a file (reverse lookup)
9. **`get_transitive_deps`** - Get transitive dependencies up to a specified depth
10. **`find_hotspots`** - Find most-imported files (with pagination)
11. **`find_circular`** - Detect circular dependencies (with pagination)
12. **`find_unused`** - Find files with no incoming dependencies (with pagination)
13. **`find_islands`** - Find disconnected components (with pagination)
14. **`analyze_summary`** - Get dependency analysis summary (counts only)

### `rfx analyze`

Analyze codebase structure and dependencies. By default shows a summary; use specific flags for detailed results.

**Subcommands:**
- `--circular` - Detect circular dependencies (A โ†’ B โ†’ C โ†’ A)
- `--hotspots` - Find most-imported files
- `--unused` - Find files with no incoming dependencies
- `--islands` - Find disconnected components

**Pagination (default: 200 results per page):**
- Use `--limit N` to specify results per page
- Use `--offset N` to skip first N results
- Use `--all` to return unlimited results

**Examples:**
```bash
# Show summary of all analyses
rfx analyze

# Find circular dependencies
rfx analyze --circular

# Find hotspots (most-imported files)
rfx analyze --hotspots --min-dependents 5

# Find unused files
rfx analyze --unused

# Find disconnected components (islands)
rfx analyze --islands --min-island-size 3

# Get JSON summary of all analyses
rfx analyze --json

# Get pretty-printed JSON summary
rfx analyze --json --pretty

# Paginate results
rfx analyze --hotspots --limit 50 --offset 0  # First 50
rfx analyze --hotspots --limit 50 --offset 50 # Next 50

# Export as JSON with pagination metadata
rfx analyze --circular --json
```

**JSON Output Format (specific analyses with pagination):**
```json
{
  "pagination": {
    "total": 347,
    "count": 200,
    "offset": 0,
    "limit": 200,
    "has_more": true
  },
  "results": [...]
}
```

**Summary JSON Output Format (bare `rfx analyze --json`):**
```json
{
  "circular_dependencies": 17,
  "hotspots": 10,
  "unused_files": 82,
  "islands": 81,
  "min_dependents": 2
}
```

### `rfx deps`

Analyze dependencies for a specific file. Shows what a file imports (dependencies) or what imports it (dependents).

**Key Options:**
- `--reverse` - Show files that depend on this file (reverse lookup)
- `--depth N` - Traverse N levels deep for transitive dependencies (default: 1)
- `--format` - Output format: tree, table, json (default: tree)
- `--json` - Output as JSON
- `--pretty` - Pretty-print JSON output

**Examples:**
```bash
# Show direct dependencies
rfx deps src/main.rs

# Show files that import this file (reverse lookup)
rfx deps src/config.rs --reverse

# Show transitive dependencies (depth 3)
rfx deps src/api.rs --depth 3

# JSON output
rfx deps src/main.rs --json

# Pretty-printed JSON
rfx deps src/main.rs --json --pretty

# Table format
rfx deps src/main.rs --format table
```

**Supported Languages:** Rust, TypeScript, JavaScript, Python, Go, Java, C, C++, C#, PHP, Ruby, Kotlin

**Note:** Only static imports (string literals) are tracked. Dynamic imports are filtered by design.

### `rfx context`

Generate codebase context for AI prompts. Useful with `rfx ask --additional-context`.

**Key Options:**
- `--structure` - Show directory structure
- `--file-types` - Show file type distribution
- `--project-type` - Detect project type (CLI/library/webapp/monorepo)
- `--framework` - Detect frameworks and conventions
- `--entry-points` - Show entry point files
- `--test-layout` - Show test organization pattern
- `--config-files` - List important configuration files
- `--path <PATH>` - Focus on specific directory
- `--depth <N>` - Tree depth for structure (default: 1)

By default (no flags), all context types are shown. Use individual flags to show specific types only.

**Examples:**
```bash
# Full context (all types - default behavior)
rfx context

# Full context for monorepo subdirectory
rfx context --path services/backend

# Specific context types only
rfx context --framework --entry-points

# Use with semantic queries
rfx ask "find auth code" --additional-context "$(rfx context --framework)"
```

### Other Commands

- `rfx stats` - Display index statistics
- `rfx clear` - Clear the search index
- `rfx list-files` - List all indexed files
- `rfx watch` - Watch for file changes and auto-reindex

Run `rfx <command> --help` for detailed options.

## ๐ŸŒณ AST Pattern Matching

Reflex supports **structure-aware code search** using Tree-sitter AST queries.

**โš ๏ธ WARNING:** AST queries are **SLOW** and scan the entire codebase. **Use `--symbols` instead for 95% of cases** (much faster).

**When to use AST queries:**
- You need to match code structure, not just text
- `--symbols` search is insufficient for your use case
- You have a very specific structural pattern

**Basic usage:**
```bash
rfx query <PATTERN> --ast <AST_PATTERN> --lang <LANGUAGE>

# Example: Find all Rust functions
rfx query "fn" --ast "(function_item) @fn" --lang rust

# Example: Find all TypeScript classes
rfx query "class" --ast "(class_declaration) @class" --lang typescript
```

**Supported languages:** Rust, TypeScript, JavaScript, Python, Go, Java, C, C++, C#, PHP, Ruby, Kotlin, Zig

For detailed AST query syntax and examples, see the [Tree-sitter documentation](https://tree-sitter.github.io/tree-sitter/using-parsers#pattern-matching-with-queries).

## ๐ŸŒ Supported Languages/Dialects

| Language | Extensions | Symbol Extraction |
|----------|------------|-------------------|
| **Rust** | `.rs` | Functions, structs, enums, traits, impls, modules, methods |
| **TypeScript** | `.ts`, `.tsx`, `.mts`, `.cts` | Functions, classes, interfaces, types, enums, React components |
| **JavaScript** | `.js`, `.jsx`, `.mjs`, `.cjs` | Functions, classes, constants, methods, React components |
| **Vue** | `.vue` | Functions, constants, methods from `<script>` blocks |
| **Svelte** | `.svelte` | Functions, variables, reactive declarations |
| **PHP** | `.php` | Functions, classes, interfaces, traits, methods, namespaces, enums |
| **Python** | `.py` | Functions, classes, methods, decorators, lambdas |
| **Go** | `.go` | Functions, types, interfaces, methods, constants |
| **Java** | `.java` | Classes, interfaces, enums, methods, fields, constructors |
| **C** | `.c`, `.h` | Functions, structs, enums, unions, typedefs |
| **C++** | `.cpp`, `.hpp`, `.cxx` | Functions, classes, namespaces, templates, methods |
| **C#** | `.cs` | Classes, interfaces, structs, enums, methods, properties |
| **Ruby** | `.rb`, `.rake`, `.gemspec` | Classes, modules, methods, constants, variables |
| **Kotlin** | `.kt`, `.kts` | Classes, functions, interfaces, objects, properties |
| **Zig** | `.zig` | Functions, structs, enums, constants, variables |

**Note:** Full-text search works on **all file types** regardless of parser support. Symbol filtering requires a language parser.

## ๐Ÿ—๏ธ Architecture

Reflex uses a **trigram-based inverted index** combined with **runtime symbol detection**:

### Indexing Phase
1. Extract trigrams (3-character substrings) from all files
2. Build inverted index: `trigram โ†’ [file_id, line_no]`
3. Store full file contents in memory-mapped `content.bin`
4. Start background symbol indexing (caches symbols for faster queries)

### Query Phase
1. **Full-text queries**: Intersect trigram posting lists โ†’ verify matches
2. **Symbol queries**: Trigrams narrow to ~10-100 candidates โ†’ parse with tree-sitter โ†’ filter symbols
3. Memory-mapped I/O for instant cache access

### Cache Structure (`.reflex/`)
```
.reflex/
  meta.db          # SQLite: file metadata, stats, config, hashes
  trigrams.bin     # Inverted index (memory-mapped)
  content.bin      # Full file contents (memory-mapped)
  config.toml      # Index settings
  indexing.status  # Background symbol indexer status
```

## โšก Performance

Reflex is designed for speed at every level:

**Query Performance:**
- **Full-text & Regex**: Efficient queries via trigram indexing
- **Symbol queries**: Slower due to runtime tree-sitter parsing, but still efficient
- **Cached queries**: Repeated searches benefit from memory-mapped cache
- Scales well from small projects to large codebases (10k+ files)

**Indexing Performance:**
- **Initial indexing**: Parallel processing using 80% of CPU cores
- **Incremental updates**: Only reindexes changed files via blake3 hashing
- **Memory-mapped I/O**: Zero-copy access for cache reads

## ๐Ÿ”ง Configuration

Reflex respects `.gitignore` files automatically. Additional configuration via `.reflex/config.toml`:

```toml
[index]
languages = []  # Empty = all supported languages
max_file_size = 10485760  # 10 MB
follow_symlinks = false

[search]
default_limit = 100

[performance]
parallel_threads = 0  # 0 = auto (80% of available cores)
```

## ๐Ÿค– AI Integration

Reflex provides clean JSON output for AI coding assistants and automation:

```bash
rfx query "parse_tree" --json --symbols
```

Output includes file paths, line numbers, symbol types, and code previews with pagination metadata.

## ๐Ÿ” Use Cases

- **Code Navigation**: Find all usages of functions, classes, and variables
- **Refactoring**: Identify all call sites before making changes
- **AI Assistants**: Retrieve relevant code snippets and context for LLMs
- **Debugging**: Locate where variables and functions are used
- **Documentation**: Find examples of API usage across the codebase
- **Security**: Search for potential vulnerabilities or anti-patterns

## ๐Ÿงช Testing

Reflex has comprehensive test coverage including core modules, real-world code samples across all supported languages, and end-to-end workflows.

```bash
cargo test                    # Run all tests
cargo test -- --nocapture     # Run with output
cargo test indexer::tests     # Run specific module
```

## ๐Ÿค Contributing

Contributions welcome! Reflex is built to be:
- **Fast**: Efficient search using trigram indexing and memory-mapped I/O
- **Accurate**: Complete coverage with deterministic results
- **Extensible**: Easy to add new language parsers

## ๐Ÿ“„ License

MIT License - see [LICENSE](LICENSE) for details.

## ๐Ÿ™ Acknowledgments

Built with:
- [tree-sitter]https://tree-sitter.github.io/tree-sitter/ - Incremental parsing
- [rkyv]https://rkyv.org/ - Zero-copy deserialization
- [memmap2]https://github.com/RazrFalcon/memmap2-rs - Memory-mapped I/O
- [rusqlite]https://github.com/rusqlite/rusqlite - SQLite bindings
- [blake3]https://github.com/BLAKE3-team/BLAKE3 - Fast hashing
- [ignore]https://github.com/BurntSushi/ripgrep/tree/master/crates/ignore - gitignore support

Inspired by:
- [Zoekt]https://github.com/sourcegraph/zoekt - Trigram-based code search
- [Sourcegraph]https://sourcegraph.com/ - Code search for teams
- [ripgrep]https://github.com/BurntSushi/ripgrep - Fast text search

---

**Made with โค๏ธ for developers and AI coding assistants**