# DEEP AUDIT: phptaint
**Crate:** phptaint v0.1.1
**Lines of Code:** ~7,007
**Audit Date:** 2026-03-26
**Auditor:** Kimi Code CLI (TOKIO-LEVEL Deep Audit)
---
## Executive Summary
phptaint is a security-focused PHP lexer, parser, AST builder, and configurable taint analysis engine written in Rust. It's designed for analyzing PHP code to detect security vulnerabilities by tracking data flow from untrusted sources (superglobals like `$_GET`, `$_POST`) to dangerous sinks (`eval`, `exec`, SQL queries, etc.).
**Overall Assessment:** This is a **functional but limited** taint analysis engine. It successfully handles common PHP security patterns but has significant gaps in modern PHP 8.x support, interprocedural analysis depth, and real-world framework coverage.
---
## 1. Parser: Modern PHP (8.x) Support
### ✅ What IS Supported
| Basic PHP tags (`<?php`, `?>`) | ✅ Full | Well-tested |
| Variables, superglobals | ✅ Full | All 9 superglobals recognized |
| String literals (single/double quoted) | ✅ Full | With escape sequence handling |
| String interpolation | ✅ Full | Both `$var` and `{$var}` syntax |
| Numbers (int, float) | ✅ Full | Decimal only |
| Function calls | ✅ Full | Named and dynamic (`$fn()`) |
| Method calls (`->`) | ✅ Full | Chained calls supported |
| Static calls (`::`) | ✅ Full | Including `::class` |
| Nullsafe operator (`?->`) | ✅ Full | PHP 8.0 - correctly treated as regular method call for taint |
| Ternary operator (`?:`, `? :`) | ✅ Full | Elvis operator supported |
| Null coalesce (`??`) | ✅ Full | PHP 7.0+ |
| Concatenation (`.`) | ✅ Full | Left-associative chains |
| Array literals (`[]`) | ✅ Full | Short syntax only |
| Arrow functions (`fn() =>`) | ✅ Partial | Parsed as closures |
| Match expressions | ✅ Partial | Parsed but arms not deeply analyzed |
| Enums | ✅ Partial | Parsed as class-like structures |
| `readonly` classes/properties | ✅ Partial | Keyword recognized |
| Namespaces | ✅ Partial | Basic support, use aliases work |
| Classes, interfaces, traits | ✅ Partial | Abstract, extends, implements |
| Control flow (if/else, foreach, while, for, do-while) | ✅ Partial | `for` loops simplified to `while` |
| Try/catch/finally | ✅ Partial | Catch clauses merged |
| Switch/case/default | ✅ Partial | Basic support |
| Closures with `use` | ✅ Partial | Captured variables tracked |
| Comments (`//`, `/* */`) | ✅ Full | Skipped correctly |
### ❌ What is NOT Supported (Critical Gaps)
| **Named Arguments** (`foo(name: $value)`) | 🔴 High | Colon in arg list causes parse errors |
| **PHP 8 Attributes** (`#[Attribute]`) | 🔴 High | Not recognized, causes errors |
| **Union Types** (`int\|string`) | 🟡 Medium | `\|` confuses parser |
| **Intersection Types** (`Iterator&Countable`) | 🟡 Medium | `&` in type position fails |
| **Heredoc/Nowdoc** (`<<<EOT`) | 🟡 Medium | Not recognized |
| **Named Arguments in Constructors** | 🔴 High | Common in modern PHP |
| **Variadic functions** (`...$args`) | 🟡 Medium | Spread operator not handled |
| **First-class callables** (`strlen(...)`) | 🟡 Medium | PHP 8.1 feature |
| **List destructuring** (`[$a, $b] = $arr`) | 🟡 Medium | Common pattern |
| **Variable variables** (`$$var`) | 🟢 Low | Edge case |
| **Goto statements** | 🟢 Low | Rare in modern code |
| **Declare statements** (`declare(strict_types=1)`) | 🟡 Medium | Strict typing common |
| **Constructor property promotion** | 🟡 Medium | PHP 8.0 feature, parse errors |
| **Static arrow functions** (`static fn()`) | 🟢 Low | Edge case |
| **Mixed HTML/PHP** (alternative syntax) | 🟡 Medium | Template files |
| **Fibers** (`Fiber` class) | 🟢 Low | Rare |
| **Yield/generators** | 🟢 Low | Partial support |
| **Anonymous classes** (`new class {}`) | 🟢 Low | Partial support |
| **Reference parameters** (`&$var`) | 🟢 Low | Partial support |
### Parser Verdict
**Grade: C+**
The parser handles "enough" PHP for basic security analysis of older codebases (PHP 5.x-7.x) but will **fail significantly on modern PHP 8.x codebases** that use:
- Named arguments (extremely common in Laravel, Symfony, modern libraries)
- Attributes (Doctrine, Symfony, Laravel all use these heavily)
- Union types (PHP 8.0+, now standard practice)
The parser uses error recovery (synchronization after errors) which helps, but repeated parse errors will miss security-critical code paths.
---
## 2. Taint Engine: Function Call Tracking
### How It Works
The taint engine uses a **two-pass analysis**:
1. **Pass 1**: Collect function summaries (which parameters flow to which sinks)
2. **Pass 2**: Use summaries for interprocedural analysis
### ✅ What's Tracked Well
| Direct superglobal → sink | ✅ Yes | `eval($_GET['cmd'])` |
| Variable assignment | ✅ Yes | `$x = $_GET['x']; eval($x);` |
| Through function parameters | ✅ Yes | `function f($p) { eval($p); } f($_GET['x']);` |
| Through concatenation | ✅ Yes | `$sql = "SELECT " . $_GET['x'];` |
| Through arrays | ⚠️ Basic | `$arr = [$_GET['x']]; eval($arr[0]);` |
| Through closures | ✅ Yes | `add_action('hook', function() { eval($_GET['x']); });` |
| Through ternary | ✅ Yes | `$x = $cond ? $_GET['x'] : 'safe';` |
| Through null coalesce | ✅ Yes | `$x = $_GET['x'] ?? 'safe';` |
### ⚠️ Limitations in Function Tracking
| **No return value tracking** | 🔴 High | Functions that return tainted data not tracked: `function getInput() { return $_GET['x']; }` |
| **No object property tracking** | 🔴 High | `$this->property` taint not tracked across methods |
| **No array element tracking** | 🟡 Medium | `$data['key']` where key is variable not tracked well |
| **Limited static analysis** | 🟡 Medium | No constant propagation |
| **No type-based filtering** | 🟡 Medium | `is_string()` doesn't clear taint |
| **Loop unrolling** | 🟢 Low | Loops analyzed once |
### Function Summary System
The function summary system is **sound but incomplete**:
```rust
// From summary.rs
pub struct FunctionSummary {
/// Map from parameter index to the sinks reachable from that parameter.
pub sink_params: HashMap<usize, Vec<SinkSummary>>,
}
```
**Strengths:**
- Cross-file analysis works via `analyze_multi()`
- Correctly identifies which function parameters reach sinks
- Framework hook analysis (WordPress `add_action`) with unauthenticated hook detection
**Weaknesses:**
- Only tracks forward flow (params → sinks), not backward (return values)
- No summary for built-in PHP functions
- No alias/variable-function tracking through assignments
### Taint Engine Verdict
**Grade: B-**
Good for direct data flow analysis. **Will miss** vulnerabilities that flow:
- Through return values
- Through object properties
- Through complex array manipulations
- Through callbacks stored in variables
---
## 3. Sink Coverage
### PHP Core Sinks (TaintRegistry::php_core())
| **RCE** | `eval`, `system`, `exec`, `passthru`, `shell_exec`, `popen`, `assert`, `proc_open`, `call_user_func`, `call_user_func_array`, `create_function` | ✅ Complete |
| **XSS** | `echo`, `print` | ⚠️ Basic |
| **SQLi** | PDO `$db->query()`, `$db->exec()` | ⚠️ Basic |
| **LFI/RFI** | `include`, `include_once`, `require`, `require_once` | ✅ Complete |
| **File Operations** | `file_put_contents`, `fwrite`, `file_get_contents`, `fopen`, `readfile`, `file`, `unlink`, `rename`, `copy`, `mkdir`, `rmdir` | ✅ Complete |
| **SSRF** | `curl_exec`, `file_get_contents` | ✅ Complete |
| **Deserialization** | `unserialize` | ✅ Complete |
| **Variable Injection** | `extract` | ✅ Complete |
| **Header Injection** | `header` | ✅ Complete |
| **Dynamic Function** | `$fn()` | ✅ Complete |
### WordPress Sinks (TaintRegistry::wordpress())
| **SQLi** | `$wpdb->query`, `$wpdb->get_results`, `$wpdb->get_var`, `$wpdb->get_row`, `$wpdb->get_col` | ✅ Complete |
| **Open Redirect** | `wp_redirect` | ✅ Complete |
| **Option Injection** | `update_option`, `add_option` | ✅ Complete |
| **User Meta Injection** | `update_user_meta`, `add_user_meta` | ✅ Complete |
| **Email Injection** | `wp_mail` | ⚠️ Only arg 3 |
| **Auth Bypass** | `wp_set_auth_cookie`, `wp_set_current_user` | ✅ Complete |
| **Object Injection** | `maybe_unserialize` | ✅ Complete |
| **SSRF** | `wp_remote_get`, `wp_remote_post`, `wp_remote_request` | ✅ Complete |
| **XSS** | `echo` + WP sanitizers tracked | ⚠️ Basic |
### Laravel Sinks (TaintRegistry::laravel())
| **SQLi** | `DB::select`, `DB::statement`, `DB::unprepared` | ⚠️ Basic |
| **XSS** | Blade auto-escaping noted but not analyzed | ❌ Not implemented |
**MAJOR GAP:** Laravel's `DB::raw()`, Eloquent `whereRaw()`, `orderByRaw()` are **NOT in the default Laravel registry**. This is a critical omission for a Laravel-focused analysis.
### Sanitizers
| **Universal** (clears all taint) | `intval`, `floatval`, `boolval`, `absint`, `basename`, `is_numeric`, `ctype_*` |
| **XSS-specific** | `htmlspecialchars`, `htmlentities`, `strip_tags`, `esc_html`, `esc_attr`, `esc_url`, `wp_kses*` |
| **SQL-specific** | `prepare` method |
**Gap:** No regex-based sanitization tracking (e.g., `preg_replace` with whitelist patterns).
### Sink Coverage Verdict
**Grade: B**
Good coverage of PHP core and WordPress. **Laravel support is dangerously incomplete** - missing critical ORM raw query methods that are the primary source of SQL injection in Laravel apps.
---
## 4. What's MISSING vs SonarQube / Psalm
### Feature Comparison
| **Type System** | ❌ None | ✅ Full | ✅ Full | No types = missed context |
| **Control Flow Analysis** | ⚠️ Basic | ✅ Advanced | ✅ Advanced | No branch sensitivity |
| **Interprocedural** | ⚠️ Summaries | ✅ Full | ✅ Full | Only params→sinks, not returns |
| **Object Sensitivity** | ❌ None | ✅ Yes | ✅ Yes | `$this->prop` not tracked |
| **Array Sensitivity** | ⚠️ Basic | ✅ Full | ✅ Full | `$arr[$key]` poorly tracked |
| **Taint Configuration** | ✅ TOML | ✅ GUI/API | ✅ Annotations | phptaint's TOML is good |
| **Framework Support** | ⚠️ WP/Laravel basics | ✅ Extensive | ✅ Extensive | Missing Symfony, Doctrine, etc |
| **False Positive Control** | ⚠️ Basic | ✅ Advanced | ✅ Advanced | No path constraints |
| **Performance** | ✅ Fast | ⚠️ Heavy | ✅ Fast | phptaint is lightweight |
| **IDE Integration** | ❌ None | ✅ Yes | ✅ Yes | CLI only |
| **CI/CD Integration** | ✅ Easy | ✅ Yes | ✅ Yes | All work well |
| **Modern PHP 8** | ⚠️ Partial | ✅ Full | ✅ Full | Attributes, unions missing |
### Specific Missing Features
#### vs SonarQube Security
1. **No CWE Mapping** - SonarQube maps all findings to CWE IDs
2. **No Security Hotspots** - Sonar distinguishes hotspots from vulnerabilities
3. **No Dataflow Path Visualization** - Sonar shows full paths in UI
4. **No Rule Customization at Analysis Time** - phptaint requires code changes
5. **No Exclusion Annotations** - Can't mark false positives in code
6. **No PR/MR Decoration** - No inline comments on pull requests
#### vs Psalm Taint Analysis
1. **No Type-Based Taint** - Psalm knows `htmlspecialchars` returns `string` that is HTML-safe
2. **No Annotations** - Psalm supports `@psalm-taint-escape`, `@psalm-taint-source`, `@psalm-taint-sink`
3. **No Template/Generic Support** - Can't model `Collection<T>` taint
4. **No Assertions** - `assert(is_string($x))` doesn't clear taint
5. **No String Literal Analysis** - `"safe_$userInput"` is treated as fully tainted
6. **No Include/Require Analysis** - Psalm tracks includes, phptaint treats each file standalone
### Critical Gaps for Enterprise Use
1. **No Fingerprinting** - Can't suppress findings by hash
2. **No Baseline Support** - Can't ignore existing issues
3. **No Severity Override** - Registry defines severity, can't adjust per-finding
4. **No Custom Source Definitions** - Limited to superglobals (can't add `$_REQUEST['custom']` as source)
5. **No Sink-Specific Sanitizers** - `htmlspecialchars` clears taint for all contexts, not just XSS
### Verdict
**phptaint is a lightweight alternative, NOT a replacement for SonarQube/Psalm.**
Use phptaint when:
- You need fast, lightweight scanning
- You're analyzing WordPress plugins
- You want embeddable Rust-based analysis
- You're building a custom security tool
Use SonarQube/Psalm when:
- You need comprehensive type-aware analysis
- You're analyzing modern PHP 8.x codebases
- You need enterprise reporting/tracking
- You need low false positive rates
---
## 5. Laravel SQL Injection Detection
### Test Case: Would phptaint find this?
```php
<?php
// routes/web.php or Controller
Route::get('/users', function (Request $request) {
$name = $request->input('name');
// VULNERABLE - raw SQL concatenation
$users = DB::select("SELECT * FROM users WHERE name = '$name'");
// Also vulnerable - using query builder unsafely
$users = User::whereRaw("name = '$name'")->get();
// Also vulnerable - orderByRaw
$users = User::orderByRaw($name)->get();
return view('users', compact('users'));
});
```
### Analysis
**Test 1: `DB::select()` with tainted data**
```php
DB::select("SELECT * FROM users WHERE name = '$name'");
```
- `$name` comes from `$request->input('name')`
- **PROBLEM:** Laravel's `Request::input()` is **NOT recognized as a taint source**
- The registry only tracks `$_GET`, `$_POST`, superglobals
- **VERDICT: MISS** ❌
**Test 2: Direct superglobal access**
```php
$name = $_GET['name'];
$users = DB::select("SELECT * FROM users WHERE name = '$name'");
```
- `$_GET` IS a recognized source
- `DB::select` IS in the Laravel registry
- **VERDICT: DETECT** ✅
**Test 3: `whereRaw()` and `orderByRaw()`**
```php
User::whereRaw("name = '$name'")->get();
User::orderByRaw($name)->get();
```
- These methods are **NOT in TaintRegistry::laravel()**
- Only `select`, `statement`, `unprepared` are registered
- **VERDICT: MISS** ❌
**Test 4: `DB::raw()`**
```php
User::where('name', DB::raw("'$name'"))->get();
```
- `DB::raw` is **NOT in the registry**
- **VERDICT: MISS** ❌
### Laravel Detection Summary
| Pattern | Detected? | Notes |
|---------|-----------|-------|
| `$_GET` → `DB::select()` | ✅ Yes | Basic case works |
| `$request->input()` → `DB::select()` | ❌ No | Request methods not sources |
| `$_GET` → `whereRaw()` | ❌ No | Method not in registry |
| `$_GET` → `orderByRaw()` | ❌ No | Method not in registry |
| `$_GET` → `DB::raw()` | ❌ No | Method not in registry |
| `$_GET` → Eloquent `where()` | ❌ No | Query builder methods not sinks |
### Required Registry Additions for Laravel
```rust
// Missing from TaintRegistry::laravel()
// Request sources (Illuminate\Http\Request methods)
// These would need new source tracking infrastructure
// Additional method sinks
r.method_sinks.entry("whereraw".into())... // SQLi
r.method_sinks.entry("orderbyraw".into())... // SQLi
r.method_sinks.entry("havingraw".into())... // SQLi
r.method_sinks.entry("selectraw".into())... // SQLi
r.method_sinks.entry("raw".into())... // SQLi (DB::raw)
// Eloquent static methods
r.sinks.entry("update".into())... // Mass assignment
r.sinks.entry("insert".into())... // Mass assignment
```
### Laravel Verdict
**Grade: D+ for Laravel**
phptaint will **miss most real Laravel SQL injections** because:
1. Laravel uses `Request` objects, not direct superglobals
2. The ORM's raw methods aren't registered as sinks
3. Mass assignment vulnerabilities not covered
It will only catch:
- Direct `$_GET`/`$_POST` access (rare in modern Laravel)
- `DB::select()`/`statement()` with superglobal input
---
## 6. Code Quality Assessment
### Strengths
1. **Good Architecture** - Clean separation: lexer → parser → AST → taint analyzer
2. **Error Recovery** - Parser synchronizes after errors, continues analysis
3. **Configurable** - TOML-based registry is well-designed
4. **No Unsafe Code** - `#![forbid(unsafe_code)]` throughout
5. **Good Documentation** - Comprehensive rustdocs
6. **Test Coverage** - ~1,800 lines of tests across parser, taint, adversarial cases
7. **Framework Aware** - Hook registration tracking for WordPress
### Weaknesses
1. **Parser Complexity** - Single-file recursive descent without formal grammar
2. **No Fuzzing** - No `cargo-fuzz` integration for parser robustness
3. **Hardcoded Sources** - Can't easily add new taint sources
4. **No Incremental Analysis** - Full reparse every time
5. **Stringly-Typed AST** - Variables are `String`, not symbol IDs
6. **Limited Scope Resolution** - Namespaces work but complex use statements don't
### Security of the Analyzer Itself
1. **Budget Mechanisms** - Parser has `iteration_budget` (3x token count) to prevent infinite loops
2. **Expression Depth Limit** - Default max depth 128 for nested expressions
3. **Panic Safety** - Adversarial tests use `catch_unwind`
4. **No External I/O** - Pure analysis, no network/file operations during analysis
---
## 7. Recommendations
### For Users
1. **WordPress**: Use `TaintRegistry::wordpress()` - best supported use case
2. **Laravel**: Build custom registry with additional method sinks (see section 5)
3. **Modern PHP 8.x**: Expect parse errors - use as first-pass filter only
4. **CI Integration**: Good for fast pre-commit checks, not for final security gate
### For Maintainers
**Priority 1 (Critical):**
- [ ] Add Laravel `Request` input methods as taint sources
- [ ] Register `whereRaw`, `orderByRaw`, `DB::raw` as SQLi sinks
- [ ] Support PHP 8 named arguments in parser
**Priority 2 (High):**
- [ ] Add PHP 8 attribute parsing (skip with warning)
- [ ] Support union types in parser (skip type hints)
- [ ] Add return value tracking for function summaries
**Priority 3 (Medium):**
- [ ] Add Psalm-style annotations for false positive suppression
- [ ] Implement basic array element tracking
- [ ] Add Symfony/Doctrine registry presets
**Priority 4 (Low):**
- [ ] Fuzz testing for parser
- [ ] SARIF output format
- [ ] Incremental analysis support
---
## 8. Conclusion
phptaint is a **functional, lightweight taint analysis engine** best suited for:
- WordPress plugin security scanning
- Quick security assessments of legacy PHP
- Embedded use in larger Rust-based security tools
It is **not suitable** as a standalone enterprise security scanner for:
- Modern PHP 8.x codebases using attributes, named args, union types
- Laravel applications (without significant registry customization)
- Codebases requiring low false positive rates
**Overall Grade: B-** (Good for specific use cases, limited for general PHP security)
**Recommended Use:** Combine phptaint with Psalm taint analysis for comprehensive coverage. Use phptaint for WordPress/fast scanning, Psalm for deep type-aware analysis.
---
## Appendix: Code Metrics
```
File Lines Purpose
────────────────────────────────────────────────────────
src/lexer.rs 984 Tokenizer
src/parser/mod.rs 354 Parser infrastructure
src/parser/expressions.rs 413 Expression parsing
src/parser/statements.rs 751 Statement parsing
src/parser/tests.rs 768 Parser tests
src/parser/adversarial_tests.rs 777 Edge case tests
src/ast.rs 558 AST definitions
src/taint/mod.rs 1110 Main taint analyzer
src/taint/registry.rs 471 Sink/sanitizer registry
src/taint/summary.rs 190 Function summaries
src/config.rs 340 TOML configuration
src/severity.rs 65 Severity enum
src/lib.rs 60 Public API
────────────────────────────────────────────────────────
Total ~7,007
```
**Dependencies:** `serde`, `thiserror`, `toml` - all mature, well-maintained crates.
**MSRV:** Rust 1.80 (August 2024) - relatively recent.