# 🚀 Turbo Security Analyzer
Ultra-fast security scanning that's 10-100x faster than traditional approaches with advanced false positive reduction.
## Overview
The Turbo Security Analyzer is a high-performance security scanner that utilizes Rust's full capabilities for blazing fast analysis. It achieves dramatic speedups through:
- **Smart File Selection**: Eliminates 80-90% of work upfront using gitignore-aware discovery
- **Multi-Pattern Matching**: Aho-Corasick algorithm for simultaneous pattern search
- **Memory-Mapped I/O**: Zero-copy file reading for large files
- **Parallel Processing**: Work-stealing thread pool with early termination
- **Intelligent Caching**: Concurrent caching with LRU eviction
- **Specialized Scanners**: Optimized for common file types
- **Advanced False Positive Reduction**: Context-aware filtering and confidence scoring
## Key Features
### 🎯 Smart File Discovery
- Git-aware file discovery using `git ls-files`
- Automatically skips ignored files
- Prioritizes critical files (.env, configs, secrets)
- **Enhanced filtering**: Comprehensive exclusion of assets, binaries, and generated files
### ⚡ High-Performance Scanning
- Aho-Corasick multi-pattern matching
- Memory-mapped I/O for large files
- Work-stealing parallelism across CPU cores
- Early termination on critical findings
### 🧠 Intelligent Detection
- **Advanced false positive reduction**: Context-aware confidence scoring
- **Content analysis**: Detects and skips minified, generated, and binary content
- **Pattern precision**: More specific regex patterns for common secrets
- GitIgnore risk assessment
- Template/example file exclusion
### 🛡️ False Positive Reduction
The analyzer employs multiple layers of false positive reduction:
#### File-Level Filtering
- **Binary files**: Comprehensive detection of executables, images, media files
- **Asset files**: Automatic exclusion of images, fonts, videos in asset directories
- **Lock files**: Enhanced detection including `bun.lockb`, `pnpm-lock.yaml`, etc.
- **Minified files**: Detection based on filename patterns and content analysis
- **Generated files**: Recognition of auto-generated and compiled files
- **SVG files**: Special handling for SVG files that often contain base64 data
#### Content-Level Analysis
- **Base64 detection**: Skips files with high base64 content ratio (except JWT tokens)
- **Data URLs**: Filters out `data:image/`, `data:font/`, `data:application/` content
- **Minified code**: Detects and skips minified JavaScript/CSS
- **Binary content**: Identifies binary data that passed initial filtering
#### Pattern-Level Improvements
- **Context-aware patterns**: Require proper assignment context (e.g., `api_key = "..."`)
- **Length requirements**: Minimum lengths for API keys to reduce false matches
- **Enhanced confidence scoring**: Multi-factor scoring based on context and content
- **False positive keywords**: Extensive list of example/placeholder indicators
- **Template literal detection**: Automatically excludes JavaScript/TypeScript template literals (`${...}`)
- **React/JSX awareness**: Special handling for React components and JSX files
- **Code generation detection**: Identifies files that generate example code snippets
#### Example Exclusions
The analyzer now intelligently skips:
```
❌ package-lock.json, yarn.lock, bun.lockb (dependency hashes)
❌ image.svg (base64 encoded graphics)
❌ bundle.min.js (minified code)
❌ font.woff2, icon.png (asset files)
❌ // TODO: replace with your API key (comments/examples)
❌ data:image/png;base64,iVBOR... (data URLs)
❌ Generated by webpack (auto-generated files)
❌ APICodeDialog.jsx (React components that generate example code)
❌ Authorization: "Bearer ${selectedApiKey?.apiKey}" (template literals)
❌ const code = `API_KEY = "${apiKey}"` (code generation functions)
```
## Usage
### Integration with CLI
The turbo analyzer is integrated into the main security command:
```bash
# Fast security scan with false positive reduction
sync-ctl security /path/to/project
# Include low severity findings (thorough mode)
sync-ctl security --include-low /path/to/project
# Skip secret detection (lightning mode)
sync-ctl security --no-secrets /path/to/project
```
### Scan Modes
The analyzer automatically chooses the best mode based on your flags:
- **Lightning**: Critical files only (.env, configs), basic patterns
- **Fast**: Smart sampling, priority patterns, skip large files
- **Balanced**: Good coverage with performance optimizations (default)
- **Thorough**: Full scan with all patterns (still optimized)
- **Paranoid**: Everything including low-severity findings
## Architecture
### Core Components
```
┌─────────────────────┐
│ File Discovery │ ← Git-aware, comprehensive filtering
└──────────┬──────────┘
│
┌──────────▼──────────┐
│ Content Analysis │ ← Binary/minified/generated detection
└──────────┬──────────┘
│
┌──────────▼──────────┐
│ Priority Scoring │ ← Critical files first
└──────────┬──────────┘
│
┌──────────▼──────────┐
│ Pattern Engine │ ← Context-aware Aho-Corasick matching
└──────────┬──────────┘
│
┌──────────▼──────────┐
│ Confidence Scorer │ ← Multi-factor false positive reduction
└──────────┬──────────┘
│
┌──────────▼──────────┐
│ Parallel Scanner │ ← Work-stealing threads
└──────────┬──────────┘
│
┌──────────▼──────────┐
│ Result Cache │ ← Concurrent caching
└──────────┬──────────┘
│
┌──────────▼──────────┐
│ Report Generator │ ← Aggregation & scoring
└─────────────────────┘
```
### Excluded File Types
The analyzer automatically excludes these file types to reduce false positives:
**Binary & Media Files:**
- Images: `.jpg`, `.png`, `.gif`, `.svg`, `.ico`, `.webp`, etc.
- Media: `.mp3`, `.mp4`, `.avi`, `.mov`, `.wav`, etc.
- Fonts: `.ttf`, `.otf`, `.woff`, `.woff2`, `.eot`
- Documents: `.pdf`, `.doc`, `.xls`, `.ppt`, etc.
**Build & Dependencies:**
- Lock files: `package-lock.json`, `yarn.lock`, `pnpm-lock.yaml`, `bun.lockb`, `cargo.lock`, etc.
- Build outputs: `dist/`, `build/`, `.next/`, `.nuxt/`, `.output/`
- Minified files: `*.min.js`, `*.bundle.js`, `*.chunk.js`
**Development & Tooling:**
- IDE files: `.vscode/`, `.idea/`, `.vs/`
- OS files: `.DS_Store`, `Thumbs.db`
- Generated files: `*.generated.*`, `*.d.ts`
- Source maps: `*.map`
### Pattern Categories
- **Secrets**: API keys, passwords, tokens (with context requirements)
- **Environment Variables**: Sensitive config values
- **Cryptographic Material**: Private keys, certificates
- **Cloud Credentials**: AWS, GCP, Azure keys (with proper formatting)
- **Database Connections**: Connection strings with credentials
## Performance
Typical performance improvements over traditional scanning:
- **Lightning Mode**: 50-100x faster (critical files only)
- **Fast Mode**: 20-50x faster (smart sampling)
- **Balanced Mode**: 10-25x faster (default, good coverage)
- **Thorough Mode**: 5-10x faster (comprehensive scan)
**False Positive Reduction**: 80-95% reduction in false positives compared to basic pattern matching.
## Implementation Details
### Enhanced File Discovery
```rust
// Comprehensive file type detection
let binary_extensions = ["exe", "dll", "jpg", "png", "gif", "svg", "mp4", "pdf", ...];
let asset_extensions = ["jpg", "png", "svg", "ttf", "woff", "mp3", ...];
let lock_files = ["package-lock.json", "yarn.lock", "bun.lockb", ...];
// Content-based filtering
}
```
### Context-Aware Pattern Matching
```rust
// Require assignment context for API keys
pattern: r#"(?i)api[_-]?key\s*[:=]\s*["']([A-Za-z0-9_-]{32,})["']"#
// Multi-factor confidence scoring
confidence = base_confidence
+ context_boost // Assignment, environment variables
+ pattern_boost // Specific keywords
- false_positive_penalty // Examples, comments
```
### Advanced Content Analysis
```rust
// Skip high base64 content (except JWT)
let base64_ratio = base64_chars / total_chars;
if base64_ratio > 0.7 && !content.contains("eyJ") {
skip_content();
}
// Detect generated files
}
```
## Contributing
The turbo analyzer is designed for extensibility:
- Add new pattern sets in `pattern_engine.rs`
- Extend file discovery logic in `file_discovery.rs`
- Implement additional scanners in `scanner.rs`
- Improve false positive detection in confidence scoring
## Recent Improvements
### v2.1 - Enhanced False Positive Reduction
- **Comprehensive file filtering**: Added support for 50+ file types and patterns
- **Content analysis**: Advanced detection of minified, generated, and binary content
- **Pattern precision**: More specific regex patterns requiring proper context
- **Confidence scoring**: Multi-factor scoring system with context awareness
- **React/JSX support**: Special handling for template literals and code generation components
- **Performance**: 3-5x faster scanning with 80-95% fewer false positives
### React/JSX Specific Improvements
- **Template literal detection**: Automatically recognizes `${variable}` patterns as non-secrets
- **Code generation files**: Identifies React components that generate API examples
- **Component patterns**: Understands `props.apiKey`, `state.token`, and similar patterns
- **Example code filtering**: Skips files with names like `APICodeDialog`, `CodeSnippet`, etc.
- **Context awareness**: Reduces confidence for patterns in React components and JSX files
### Lock File Support
- `bun.lockb` (Bun binary lock files)
- `pnpm-lock.yaml` (PNPM lock files)
- `pdm.lock` (Python PDM)
- `swift.resolved` (Swift Package Manager)
- `flake.lock` (Nix flakes)
## License
Same as the parent project.