phptaint 0.2.0

Security-focused PHP lexer, parser, AST, and configurable taint analysis engine
Documentation
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
# DEEP AUDIT: phptaint

**Crate:** phptaint v0.1.1  
**Lines of Code:** ~7,007  
**Audit Date:** 2026-03-26  
**Auditor:** Kimi Code CLI (TOKIO-LEVEL Deep Audit)

---

## Executive Summary

phptaint is a security-focused PHP lexer, parser, AST builder, and configurable taint analysis engine written in Rust. It's designed for analyzing PHP code to detect security vulnerabilities by tracking data flow from untrusted sources (superglobals like `$_GET`, `$_POST`) to dangerous sinks (`eval`, `exec`, SQL queries, etc.).

**Overall Assessment:** This is a **functional but limited** taint analysis engine. It successfully handles common PHP security patterns but has significant gaps in modern PHP 8.x support, interprocedural analysis depth, and real-world framework coverage.

---

## 1. Parser: Modern PHP (8.x) Support

### ✅ What IS Supported

| Feature | Status | Notes |
|---------|--------|-------|
| Basic PHP tags (`<?php`, `?>`) | ✅ Full | Well-tested |
| Variables, superglobals | ✅ Full | All 9 superglobals recognized |
| String literals (single/double quoted) | ✅ Full | With escape sequence handling |
| String interpolation | ✅ Full | Both `$var` and `{$var}` syntax |
| Numbers (int, float) | ✅ Full | Decimal only |
| Function calls | ✅ Full | Named and dynamic (`$fn()`) |
| Method calls (`->`) | ✅ Full | Chained calls supported |
| Static calls (`::`) | ✅ Full | Including `::class` |
| Nullsafe operator (`?->`) | ✅ Full | PHP 8.0 - correctly treated as regular method call for taint |
| Ternary operator (`?:`, `? :`) | ✅ Full | Elvis operator supported |
| Null coalesce (`??`) | ✅ Full | PHP 7.0+ |
| Concatenation (`.`) | ✅ Full | Left-associative chains |
| Array literals (`[]`) | ✅ Full | Short syntax only |
| Arrow functions (`fn() =>`) | ✅ Partial | Parsed as closures |
| Match expressions | ✅ Partial | Parsed but arms not deeply analyzed |
| Enums | ✅ Partial | Parsed as class-like structures |
| `readonly` classes/properties | ✅ Partial | Keyword recognized |
| Namespaces | ✅ Partial | Basic support, use aliases work |
| Classes, interfaces, traits | ✅ Partial | Abstract, extends, implements |
| Control flow (if/else, foreach, while, for, do-while) | ✅ Partial | `for` loops simplified to `while` |
| Try/catch/finally | ✅ Partial | Catch clauses merged |
| Switch/case/default | ✅ Partial | Basic support |
| Closures with `use` | ✅ Partial | Captured variables tracked |
| Comments (`//`, `/* */`) | ✅ Full | Skipped correctly |

### ❌ What is NOT Supported (Critical Gaps)

| Feature | Impact | Workaround |
|---------|--------|------------|
| **Named Arguments** (`foo(name: $value)`) | 🔴 High | Colon in arg list causes parse errors |
| **PHP 8 Attributes** (`#[Attribute]`) | 🔴 High | Not recognized, causes errors |
| **Union Types** (`int\|string`) | 🟡 Medium | `\|` confuses parser |
| **Intersection Types** (`Iterator&Countable`) | 🟡 Medium | `&` in type position fails |
| **Heredoc/Nowdoc** (`<<<EOT`) | 🟡 Medium | Not recognized |
| **Named Arguments in Constructors** | 🔴 High | Common in modern PHP |
| **Variadic functions** (`...$args`) | 🟡 Medium | Spread operator not handled |
| **First-class callables** (`strlen(...)`) | 🟡 Medium | PHP 8.1 feature |
| **List destructuring** (`[$a, $b] = $arr`) | 🟡 Medium | Common pattern |
| **Variable variables** (`$$var`) | 🟢 Low | Edge case |
| **Goto statements** | 🟢 Low | Rare in modern code |
| **Declare statements** (`declare(strict_types=1)`) | 🟡 Medium | Strict typing common |
| **Constructor property promotion** | 🟡 Medium | PHP 8.0 feature, parse errors |
| **Static arrow functions** (`static fn()`) | 🟢 Low | Edge case |
| **Mixed HTML/PHP** (alternative syntax) | 🟡 Medium | Template files |
| **Fibers** (`Fiber` class) | 🟢 Low | Rare |
| **Yield/generators** | 🟢 Low | Partial support |
| **Anonymous classes** (`new class {}`) | 🟢 Low | Partial support |
| **Reference parameters** (`&$var`) | 🟢 Low | Partial support |

### Parser Verdict

**Grade: C+**

The parser handles "enough" PHP for basic security analysis of older codebases (PHP 5.x-7.x) but will **fail significantly on modern PHP 8.x codebases** that use:
- Named arguments (extremely common in Laravel, Symfony, modern libraries)
- Attributes (Doctrine, Symfony, Laravel all use these heavily)
- Union types (PHP 8.0+, now standard practice)

The parser uses error recovery (synchronization after errors) which helps, but repeated parse errors will miss security-critical code paths.

---

## 2. Taint Engine: Function Call Tracking

### How It Works

The taint engine uses a **two-pass analysis**:

1. **Pass 1**: Collect function summaries (which parameters flow to which sinks)
2. **Pass 2**: Use summaries for interprocedural analysis

### ✅ What's Tracked Well

| Pattern | Tracking | Example |
|---------|----------|---------|
| Direct superglobal → sink | ✅ Yes | `eval($_GET['cmd'])` |
| Variable assignment | ✅ Yes | `$x = $_GET['x']; eval($x);` |
| Through function parameters | ✅ Yes | `function f($p) { eval($p); } f($_GET['x']);` |
| Through concatenation | ✅ Yes | `$sql = "SELECT " . $_GET['x'];` |
| Through arrays | ⚠️ Basic | `$arr = [$_GET['x']]; eval($arr[0]);` |
| Through closures | ✅ Yes | `add_action('hook', function() { eval($_GET['x']); });` |
| Through ternary | ✅ Yes | `$x = $cond ? $_GET['x'] : 'safe';` |
| Through null coalesce | ✅ Yes | `$x = $_GET['x'] ?? 'safe';` |

### ⚠️ Limitations in Function Tracking

| Issue | Impact | Details |
|-------|--------|---------|
| **No return value tracking** | 🔴 High | Functions that return tainted data not tracked: `function getInput() { return $_GET['x']; }` |
| **No object property tracking** | 🔴 High | `$this->property` taint not tracked across methods |
| **No array element tracking** | 🟡 Medium | `$data['key']` where key is variable not tracked well |
| **Limited static analysis** | 🟡 Medium | No constant propagation |
| **No type-based filtering** | 🟡 Medium | `is_string()` doesn't clear taint |
| **Loop unrolling** | 🟢 Low | Loops analyzed once |

### Function Summary System

The function summary system is **sound but incomplete**:

```rust
// From summary.rs
pub struct FunctionSummary {
    /// Map from parameter index to the sinks reachable from that parameter.
    pub sink_params: HashMap<usize, Vec<SinkSummary>>,
}
```

**Strengths:**
- Cross-file analysis works via `analyze_multi()`
- Correctly identifies which function parameters reach sinks
- Framework hook analysis (WordPress `add_action`) with unauthenticated hook detection

**Weaknesses:**
- Only tracks forward flow (params → sinks), not backward (return values)
- No summary for built-in PHP functions
- No alias/variable-function tracking through assignments

### Taint Engine Verdict

**Grade: B-**

Good for direct data flow analysis. **Will miss** vulnerabilities that flow:
- Through return values
- Through object properties
- Through complex array manipulations
- Through callbacks stored in variables

---

## 3. Sink Coverage

### PHP Core Sinks (TaintRegistry::php_core())

| Category | Functions | Coverage |
|----------|-----------|----------|
| **RCE** | `eval`, `system`, `exec`, `passthru`, `shell_exec`, `popen`, `assert`, `proc_open`, `call_user_func`, `call_user_func_array`, `create_function` | ✅ Complete |
| **XSS** | `echo`, `print` | ⚠️ Basic |
| **SQLi** | PDO `$db->query()`, `$db->exec()` | ⚠️ Basic |
| **LFI/RFI** | `include`, `include_once`, `require`, `require_once` | ✅ Complete |
| **File Operations** | `file_put_contents`, `fwrite`, `file_get_contents`, `fopen`, `readfile`, `file`, `unlink`, `rename`, `copy`, `mkdir`, `rmdir` | ✅ Complete |
| **SSRF** | `curl_exec`, `file_get_contents` | ✅ Complete |
| **Deserialization** | `unserialize` | ✅ Complete |
| **Variable Injection** | `extract` | ✅ Complete |
| **Header Injection** | `header` | ✅ Complete |
| **Dynamic Function** | `$fn()` | ✅ Complete |

### WordPress Sinks (TaintRegistry::wordpress())

| Category | Functions | Coverage |
|----------|-----------|----------|
| **SQLi** | `$wpdb->query`, `$wpdb->get_results`, `$wpdb->get_var`, `$wpdb->get_row`, `$wpdb->get_col` | ✅ Complete |
| **Open Redirect** | `wp_redirect` | ✅ Complete |
| **Option Injection** | `update_option`, `add_option` | ✅ Complete |
| **User Meta Injection** | `update_user_meta`, `add_user_meta` | ✅ Complete |
| **Email Injection** | `wp_mail` | ⚠️ Only arg 3 |
| **Auth Bypass** | `wp_set_auth_cookie`, `wp_set_current_user` | ✅ Complete |
| **Object Injection** | `maybe_unserialize` | ✅ Complete |
| **SSRF** | `wp_remote_get`, `wp_remote_post`, `wp_remote_request` | ✅ Complete |
| **XSS** | `echo` + WP sanitizers tracked | ⚠️ Basic |

### Laravel Sinks (TaintRegistry::laravel())

| Category | Methods | Coverage |
|----------|---------|----------|
| **SQLi** | `DB::select`, `DB::statement`, `DB::unprepared` | ⚠️ Basic |
| **XSS** | Blade auto-escaping noted but not analyzed | ❌ Not implemented |

**MAJOR GAP:** Laravel's `DB::raw()`, Eloquent `whereRaw()`, `orderByRaw()` are **NOT in the default Laravel registry**. This is a critical omission for a Laravel-focused analysis.

### Sanitizers

| Context | Functions |
|---------|-----------|
| **Universal** (clears all taint) | `intval`, `floatval`, `boolval`, `absint`, `basename`, `is_numeric`, `ctype_*` |
| **XSS-specific** | `htmlspecialchars`, `htmlentities`, `strip_tags`, `esc_html`, `esc_attr`, `esc_url`, `wp_kses*` |
| **SQL-specific** | `prepare` method |

**Gap:** No regex-based sanitization tracking (e.g., `preg_replace` with whitelist patterns).

### Sink Coverage Verdict

**Grade: B**

Good coverage of PHP core and WordPress. **Laravel support is dangerously incomplete** - missing critical ORM raw query methods that are the primary source of SQL injection in Laravel apps.

---

## 4. What's MISSING vs SonarQube / Psalm

### Feature Comparison

| Feature | phptaint | SonarQube | Psalm | Notes |
|---------|----------|-----------|-------|-------|
| **Type System** | ❌ None | ✅ Full | ✅ Full | No types = missed context |
| **Control Flow Analysis** | ⚠️ Basic | ✅ Advanced | ✅ Advanced | No branch sensitivity |
| **Interprocedural** | ⚠️ Summaries | ✅ Full | ✅ Full | Only params→sinks, not returns |
| **Object Sensitivity** | ❌ None | ✅ Yes | ✅ Yes | `$this->prop` not tracked |
| **Array Sensitivity** | ⚠️ Basic | ✅ Full | ✅ Full | `$arr[$key]` poorly tracked |
| **Taint Configuration** | ✅ TOML | ✅ GUI/API | ✅ Annotations | phptaint's TOML is good |
| **Framework Support** | ⚠️ WP/Laravel basics | ✅ Extensive | ✅ Extensive | Missing Symfony, Doctrine, etc |
| **False Positive Control** | ⚠️ Basic | ✅ Advanced | ✅ Advanced | No path constraints |
| **Performance** | ✅ Fast | ⚠️ Heavy | ✅ Fast | phptaint is lightweight |
| **IDE Integration** | ❌ None | ✅ Yes | ✅ Yes | CLI only |
| **CI/CD Integration** | ✅ Easy | ✅ Yes | ✅ Yes | All work well |
| **Modern PHP 8** | ⚠️ Partial | ✅ Full | ✅ Full | Attributes, unions missing |

### Specific Missing Features

#### vs SonarQube Security

1. **No CWE Mapping** - SonarQube maps all findings to CWE IDs
2. **No Security Hotspots** - Sonar distinguishes hotspots from vulnerabilities
3. **No Dataflow Path Visualization** - Sonar shows full paths in UI
4. **No Rule Customization at Analysis Time** - phptaint requires code changes
5. **No Exclusion Annotations** - Can't mark false positives in code
6. **No PR/MR Decoration** - No inline comments on pull requests

#### vs Psalm Taint Analysis

1. **No Type-Based Taint** - Psalm knows `htmlspecialchars` returns `string` that is HTML-safe
2. **No Annotations** - Psalm supports `@psalm-taint-escape`, `@psalm-taint-source`, `@psalm-taint-sink`
3. **No Template/Generic Support** - Can't model `Collection<T>` taint
4. **No Assertions** - `assert(is_string($x))` doesn't clear taint
5. **No String Literal Analysis** - `"safe_$userInput"` is treated as fully tainted
6. **No Include/Require Analysis** - Psalm tracks includes, phptaint treats each file standalone

### Critical Gaps for Enterprise Use

1. **No Fingerprinting** - Can't suppress findings by hash
2. **No Baseline Support** - Can't ignore existing issues
3. **No Severity Override** - Registry defines severity, can't adjust per-finding
4. **No Custom Source Definitions** - Limited to superglobals (can't add `$_REQUEST['custom']` as source)
5. **No Sink-Specific Sanitizers** - `htmlspecialchars` clears taint for all contexts, not just XSS

### Verdict

**phptaint is a lightweight alternative, NOT a replacement for SonarQube/Psalm.**

Use phptaint when:
- You need fast, lightweight scanning
- You're analyzing WordPress plugins
- You want embeddable Rust-based analysis
- You're building a custom security tool

Use SonarQube/Psalm when:
- You need comprehensive type-aware analysis
- You're analyzing modern PHP 8.x codebases
- You need enterprise reporting/tracking
- You need low false positive rates

---

## 5. Laravel SQL Injection Detection

### Test Case: Would phptaint find this?

```php
<?php
// routes/web.php or Controller
Route::get('/users', function (Request $request) {
    $name = $request->input('name');
    
    // VULNERABLE - raw SQL concatenation
    $users = DB::select("SELECT * FROM users WHERE name = '$name'");
    
    // Also vulnerable - using query builder unsafely  
    $users = User::whereRaw("name = '$name'")->get();
    
    // Also vulnerable - orderByRaw
    $users = User::orderByRaw($name)->get();
    
    return view('users', compact('users'));
});
```

### Analysis

**Test 1: `DB::select()` with tainted data**
```php
DB::select("SELECT * FROM users WHERE name = '$name'");
```
- `$name` comes from `$request->input('name')`
- **PROBLEM:** Laravel's `Request::input()` is **NOT recognized as a taint source**
- The registry only tracks `$_GET`, `$_POST`, superglobals
- **VERDICT: MISS** ❌

**Test 2: Direct superglobal access**
```php
$name = $_GET['name'];
$users = DB::select("SELECT * FROM users WHERE name = '$name'");
```
- `$_GET` IS a recognized source
- `DB::select` IS in the Laravel registry
- **VERDICT: DETECT** ✅

**Test 3: `whereRaw()` and `orderByRaw()`**
```php
User::whereRaw("name = '$name'")->get();
User::orderByRaw($name)->get();
```
- These methods are **NOT in TaintRegistry::laravel()**
- Only `select`, `statement`, `unprepared` are registered
- **VERDICT: MISS** ❌

**Test 4: `DB::raw()`**
```php
User::where('name', DB::raw("'$name'"))->get();
```
- `DB::raw` is **NOT in the registry**
- **VERDICT: MISS** ❌

### Laravel Detection Summary

| Pattern | Detected? | Notes |
|---------|-----------|-------|
| `$_GET` → `DB::select()` | ✅ Yes | Basic case works |
| `$request->input()` → `DB::select()` | ❌ No | Request methods not sources |
| `$_GET` → `whereRaw()` | ❌ No | Method not in registry |
| `$_GET` → `orderByRaw()` | ❌ No | Method not in registry |
| `$_GET` → `DB::raw()` | ❌ No | Method not in registry |
| `$_GET` → Eloquent `where()` | ❌ No | Query builder methods not sinks |

### Required Registry Additions for Laravel

```rust
// Missing from TaintRegistry::laravel()

// Request sources (Illuminate\Http\Request methods)
// These would need new source tracking infrastructure

// Additional method sinks
r.method_sinks.entry("whereraw".into())...     // SQLi
r.method_sinks.entry("orderbyraw".into())...   // SQLi  
r.method_sinks.entry("havingraw".into())...    // SQLi
r.method_sinks.entry("selectraw".into())...    // SQLi
r.method_sinks.entry("raw".into())...          // SQLi (DB::raw)

// Eloquent static methods
r.sinks.entry("update".into())...              // Mass assignment
r.sinks.entry("insert".into())...              // Mass assignment
```

### Laravel Verdict

**Grade: D+ for Laravel**

phptaint will **miss most real Laravel SQL injections** because:
1. Laravel uses `Request` objects, not direct superglobals
2. The ORM's raw methods aren't registered as sinks
3. Mass assignment vulnerabilities not covered

It will only catch:
- Direct `$_GET`/`$_POST` access (rare in modern Laravel)
- `DB::select()`/`statement()` with superglobal input

---

## 6. Code Quality Assessment

### Strengths

1. **Good Architecture** - Clean separation: lexer → parser → AST → taint analyzer
2. **Error Recovery** - Parser synchronizes after errors, continues analysis
3. **Configurable** - TOML-based registry is well-designed
4. **No Unsafe Code** - `#![forbid(unsafe_code)]` throughout
5. **Good Documentation** - Comprehensive rustdocs
6. **Test Coverage** - ~1,800 lines of tests across parser, taint, adversarial cases
7. **Framework Aware** - Hook registration tracking for WordPress

### Weaknesses

1. **Parser Complexity** - Single-file recursive descent without formal grammar
2. **No Fuzzing** - No `cargo-fuzz` integration for parser robustness
3. **Hardcoded Sources** - Can't easily add new taint sources
4. **No Incremental Analysis** - Full reparse every time
5. **Stringly-Typed AST** - Variables are `String`, not symbol IDs
6. **Limited Scope Resolution** - Namespaces work but complex use statements don't

### Security of the Analyzer Itself

1. **Budget Mechanisms** - Parser has `iteration_budget` (3x token count) to prevent infinite loops
2. **Expression Depth Limit** - Default max depth 128 for nested expressions
3. **Panic Safety** - Adversarial tests use `catch_unwind`
4. **No External I/O** - Pure analysis, no network/file operations during analysis

---

## 7. Recommendations

### For Users

1. **WordPress**: Use `TaintRegistry::wordpress()` - best supported use case
2. **Laravel**: Build custom registry with additional method sinks (see section 5)
3. **Modern PHP 8.x**: Expect parse errors - use as first-pass filter only
4. **CI Integration**: Good for fast pre-commit checks, not for final security gate

### For Maintainers

**Priority 1 (Critical):**
- [ ] Add Laravel `Request` input methods as taint sources
- [ ] Register `whereRaw`, `orderByRaw`, `DB::raw` as SQLi sinks
- [ ] Support PHP 8 named arguments in parser

**Priority 2 (High):**
- [ ] Add PHP 8 attribute parsing (skip with warning)
- [ ] Support union types in parser (skip type hints)
- [ ] Add return value tracking for function summaries

**Priority 3 (Medium):**
- [ ] Add Psalm-style annotations for false positive suppression
- [ ] Implement basic array element tracking
- [ ] Add Symfony/Doctrine registry presets

**Priority 4 (Low):**
- [ ] Fuzz testing for parser
- [ ] SARIF output format
- [ ] Incremental analysis support

---

## 8. Conclusion

phptaint is a **functional, lightweight taint analysis engine** best suited for:
- WordPress plugin security scanning
- Quick security assessments of legacy PHP
- Embedded use in larger Rust-based security tools

It is **not suitable** as a standalone enterprise security scanner for:
- Modern PHP 8.x codebases using attributes, named args, union types
- Laravel applications (without significant registry customization)
- Codebases requiring low false positive rates

**Overall Grade: B-** (Good for specific use cases, limited for general PHP security)

**Recommended Use:** Combine phptaint with Psalm taint analysis for comprehensive coverage. Use phptaint for WordPress/fast scanning, Psalm for deep type-aware analysis.

---

## Appendix: Code Metrics

```
File                          Lines   Purpose
────────────────────────────────────────────────────────
src/lexer.rs                  984     Tokenizer
src/parser/mod.rs             354     Parser infrastructure
src/parser/expressions.rs     413     Expression parsing
src/parser/statements.rs      751     Statement parsing
src/parser/tests.rs           768     Parser tests
src/parser/adversarial_tests.rs 777   Edge case tests
src/ast.rs                    558     AST definitions
src/taint/mod.rs              1110    Main taint analyzer
src/taint/registry.rs         471     Sink/sanitizer registry
src/taint/summary.rs          190     Function summaries
src/config.rs                 340     TOML configuration
src/severity.rs               65      Severity enum
src/lib.rs                    60      Public API
────────────────────────────────────────────────────────
Total                        ~7,007
```

**Dependencies:** `serde`, `thiserror`, `toml` - all mature, well-maintained crates.

**MSRV:** Rust 1.80 (August 2024) - relatively recent.