codesearch 0.1.9

A fast, intelligent CLI tool with multiple search modes (regex, fuzzy, semantic), code analysis, and dead code detection for popular programming languages
Documentation
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
# Graph Analysis Implementation

## Overview

Comprehensive implementation of 6 graph analysis types for code understanding and optimization.

## Implemented Graph Types

### 1. Abstract Syntax Tree (AST)
**Module**: `src/ast.rs` (420 LOC)

**What it represents**: Hierarchical structure of source code parsed into a tree where each node represents a construct (expressions, statements, functions, variables).

**Capabilities**:
- Parse Rust, Python, JavaScript using tree-sitter
- Extract functions with parameters, return types, visibility
- Extract classes with methods and fields
- Extract imports and variables
- Detect async functions and public/private visibility
- Find function call sites

**CLI Command**:
```bash
codesearch ast <file> --format json
codesearch ast . -e rs,py,js
```

**Use Cases**:
- Syntax checking and analysis
- Refactoring tools
- Code similarity analysis
- Static analysis for vulnerabilities

**Tests**: 2 unit tests

---

### 2. Control Flow Graph (CFG)
**Module**: `src/cfg.rs` (350 LOC)

**What it represents**: Flow of control within a program. Each node is a basic block (sequence of instructions with no jumps), edges represent control flow paths.

**Capabilities**:
- Build CFG from source code
- Identify basic blocks (entry, normal, branch, loop, exit, return)
- Detect unreachable code
- Find loops and loop bodies
- Calculate cyclomatic complexity
- Export to DOT format for visualization

**CLI Command**:
```bash
codesearch cfg <file> --format dot --export cfg.dot
codesearch cfg <file> --format json
```

**Use Cases**:
- Analyzing loops and conditions
- Compiler optimization
- Detecting unreachable code or infinite loops
- Security analysis for vulnerabilities

**Tests**: 4 unit tests

---

### 3. Data Flow Graph (DFG)
**Module**: `src/dfg.rs` (380 LOC)

**What it represents**: Flow of data between operations or variables. Nodes represent operations/variables, edges show data flow.

**Capabilities**:
- Track variable definitions and uses
- Identify data dependencies
- Calculate variable lifetimes
- Find unused variables
- Detect redundant computations
- Parameter and return value tracking

**CLI Command**:
```bash
codesearch dfg <file> --format dot --export dfg.dot
codesearch dfg <file> --format json
```

**Use Cases**:
- Analyzing variable lifetimes
- Optimizing memory usage
- Parallelizing computations
- Identifying redundant computations

**Tests**: 4 unit tests

---

### 4. Call Graph
**Module**: `src/callgraph.rs` (420 LOC)

**What it represents**: Calling relationships between functions/methods. Nodes represent functions, edges represent function calls.

**Capabilities**:
- Build call graph across entire codebase
- Identify recursive functions
- Find dead (unused) functions
- Calculate call depth
- Find call chains between functions
- Track direct and indirect calls

**CLI Command**:
```bash
codesearch callgraph . -e rs,py,js --format dot
codesearch callgraph . --recursive-only
codesearch callgraph . --dead-only
```

**Use Cases**:
- Analyzing function dependencies
- Identifying unused or dead functions
- Understanding large codebases
- Detecting recursive or deeply nested calls

**Tests**: 4 unit tests

---

### 5. Dependency Graph
**Module**: `src/depgraph.rs` (380 LOC) - Enhanced

**What it represents**: Dependencies between modules, libraries, or variables.

**Capabilities**:
- Build module dependency graph
- Detect circular dependencies
- Identify root and leaf modules
- Calculate dependency depth
- Export to DOT format
- JSON export for programmatic analysis

**CLI Command**:
```bash
codesearch depgraph . -e rs --format dot
codesearch depgraph . --circular-only
```

**Use Cases**:
- Managing and minimizing dependencies
- Detecting circular dependencies
- Impact analysis (which modules affected by change)
- Architecture visualization

**Tests**: 4 unit tests

---

### 6. Program Dependency Graph (PDG)
**Module**: `src/pdg.rs` (420 LOC)

**What it represents**: Combines control dependencies (CFG) and data dependencies (DFG) into a single graph.

**Capabilities**:
- Merge CFG and DFG into unified view
- Backward program slicing (what affects a variable)
- Forward program slicing (what is affected by a variable)
- Find parallelization opportunities
- Taint analysis (security)
- Identify independent operations

**CLI Command**:
```bash
codesearch pdg <file> --format dot --export pdg.dot
codesearch pdg <file> --parallel
```

**Use Cases**:
- Advanced static code analysis
- Parallelization and optimization
- Security analysis (taint analysis)
- Program slicing (isolating code affecting specific output)

**Tests**: 4 unit tests

---

## Unified Graph Analysis

**Module**: `src/graphs.rs` (350 LOC)

Provides unified interface for all graph types with common analysis and export capabilities.

**CLI Command**:
```bash
codesearch graph-all <file> --export-dir ./graphs
```

Analyzes all applicable graph types for a file and exports results.

---

## Architecture

### Module Structure
```
src/
├── ast.rs          (420 LOC) - AST parsing with tree-sitter
├── cfg.rs          (350 LOC) - Control flow analysis
├── dfg.rs          (380 LOC) - Data flow analysis
├── callgraph.rs    (420 LOC) - Call graph construction
├── depgraph.rs     (380 LOC) - Dependency analysis
├── pdg.rs          (420 LOC) - Program dependency graph
└── graphs.rs       (350 LOC) - Unified interface
```

**Total**: ~2,720 lines of new code

### Data Flow
```
Source Code
AST Parser (tree-sitter)
    ├─→ CFG Builder → Control Flow Graph
    ├─→ DFG Builder → Data Flow Graph
    ├─→ Call Graph Builder → Call Graph
    ├─→ Dependency Builder → Dependency Graph
    └─→ PDG Builder (CFG + DFG) → Program Dependency Graph
```

---

## CLI Commands Summary

| Command | Purpose | Output Formats |
|---------|---------|----------------|
| `codesearch ast` | Parse AST | text, json |
| `codesearch cfg` | Control flow | text, json, dot |
| `codesearch dfg` | Data flow | text, json, dot |
| `codesearch callgraph` | Function calls | text, json, dot |
| `codesearch depgraph` | Module dependencies | text, json, dot |
| `codesearch pdg` | Program dependencies | text, json, dot |
| `codesearch graph-all` | All graphs | text, json |

---

## Export Formats

### DOT Format
All graph types (except AST) support DOT format export for visualization with Graphviz:

```bash
codesearch cfg file.rs --format dot --export cfg.dot
dot -Tpng cfg.dot -o cfg.png
```

### JSON Format
All graph types support JSON export for programmatic analysis:

```bash
codesearch dfg file.rs --format json > dfg.json
```

---

## Usage Examples

### 1. Find Unreachable Code
```bash
codesearch cfg src/main.rs
# Output shows unreachable blocks
```

### 2. Detect Unused Variables
```bash
codesearch dfg src/lib.rs
# Output shows unused variables
```

### 3. Find Dead Functions
```bash
codesearch callgraph . -e rs --dead-only
# Lists functions never called
```

### 4. Detect Circular Dependencies
```bash
codesearch depgraph . -e rs --circular-only
# Shows circular dependency cycles
```

### 5. Find Parallelization Opportunities
```bash
codesearch pdg src/compute.rs --parallel
# Shows independent operations that can run in parallel
```

### 6. Comprehensive Analysis
```bash
codesearch graph-all src/main.rs --export-dir ./analysis
# Generates all graph types and exports to directory
```

---

## Visualization

### Graphviz Integration
All graph types export to DOT format for visualization:

```bash
# Generate PNG visualization
codesearch cfg file.rs --format dot | dot -Tpng > cfg.png

# Generate SVG visualization
codesearch callgraph . --format dot | dot -Tsvg > callgraph.svg

# Interactive visualization
codesearch pdg file.rs --format dot | xdot -
```

### Color Coding

**CFG**:
- Green: Entry blocks
- Red: Exit/Return blocks
- Yellow: Branch blocks
- Blue: Loop blocks
- White: Normal blocks

**DFG**:
- Blue: Variables
- Green: Parameters
- Yellow: Constants
- Red: Operations
- Pink: Function calls

**Call Graph**:
- Green: Root functions (entry points)
- Red: Recursive functions
- Blue: Normal functions

**PDG**:
- Red edges: Control dependencies
- Blue edges: Data dependencies
- Purple edges: Both

---

## Performance Characteristics

### CFG
- **Complexity**: O(n) where n = lines of code
- **Memory**: O(b) where b = number of basic blocks
- **Best for**: Functions up to 1000 lines

### DFG
- **Complexity**: O(n*v) where n = lines, v = variables
- **Memory**: O(v + e) where e = data flow edges
- **Best for**: Functions with <100 variables

### Call Graph
- **Complexity**: O(f*c) where f = functions, c = calls per function
- **Memory**: O(f + c) where c = total calls
- **Best for**: Codebases up to 10,000 functions

### PDG
- **Complexity**: O(CFG + DFG)
- **Memory**: O(nodes + edges from both graphs)
- **Best for**: Individual function analysis

---

## Limitations

1. **Language Support**: Currently Rust, Python, JavaScript for AST-based analysis
2. **Interprocedural Analysis**: Call graph is intraprocedural (within same codebase)
3. **Dynamic Calls**: Cannot track function pointers or dynamic dispatch
4. **Aliasing**: Limited alias analysis in DFG
5. **Precision**: Regex-based fallback for unsupported languages

---

## Future Enhancements

1. **More Languages**: Add C/C++, Java, Go grammars
2. **Interprocedural PDG**: Cross-function dependency analysis
3. **Pointer Analysis**: Better alias tracking
4. **Interactive Visualization**: Web-based graph explorer
5. **Incremental Analysis**: Update graphs on code changes
6. **Machine Learning**: Pattern recognition in graphs

---

## Testing

**Total Tests**: 22 unit tests across 6 modules

Each module includes tests for:
- Graph construction
- Node/edge operations
- Analysis algorithms (unreachable code, cycles, etc.)
- Export functionality

Run tests:
```bash
cargo test cfg
cargo test dfg
cargo test callgraph
cargo test pdg
```

---

## Integration with Existing Features

### MCP Server
All graph types can be exposed via MCP for AI agent integration.

### Incremental Indexing
Graph results can be cached in the code index for faster repeated analysis.

### Export System
Graphs integrate with existing CSV/Markdown export functionality.

---

## Conclusion

Comprehensive graph analysis suite providing 6 different perspectives on code structure:

✅ **AST** - Syntax structure  
✅ **CFG** - Control flow  
✅ **DFG** - Data flow  
✅ **Call Graph** - Function relationships  
✅ **Dependency Graph** - Module dependencies  
✅ **PDG** - Combined control + data dependencies  

All with:
- Clean, maintainable code
- Comprehensive test coverage
- Multiple export formats
- CLI integration
- Visualization support