syncable-cli 0.37.1

A Rust-based CLI that analyzes code repositories and generates Infrastructure as Code configurations
Documentation
# 🚀 Turbo Security Analyzer

Ultra-fast security scanning that's 10-100x faster than traditional approaches with advanced false positive reduction.

## Overview

The Turbo Security Analyzer is a high-performance security scanner that utilizes Rust's full capabilities for blazing fast analysis. It achieves dramatic speedups through:

- **Smart File Selection**: Eliminates 80-90% of work upfront using gitignore-aware discovery
- **Multi-Pattern Matching**: Aho-Corasick algorithm for simultaneous pattern search  
- **Memory-Mapped I/O**: Zero-copy file reading for large files
- **Parallel Processing**: Work-stealing thread pool with early termination
- **Intelligent Caching**: Concurrent caching with LRU eviction
- **Specialized Scanners**: Optimized for common file types
- **Advanced False Positive Reduction**: Context-aware filtering and confidence scoring

## Key Features

### 🎯 Smart File Discovery
- Git-aware file discovery using `git ls-files`
- Automatically skips ignored files
- Prioritizes critical files (.env, configs, secrets)
- **Enhanced filtering**: Comprehensive exclusion of assets, binaries, and generated files

### ⚡ High-Performance Scanning
- Aho-Corasick multi-pattern matching
- Memory-mapped I/O for large files
- Work-stealing parallelism across CPU cores
- Early termination on critical findings

### 🧠 Intelligent Detection
- **Advanced false positive reduction**: Context-aware confidence scoring
- **Content analysis**: Detects and skips minified, generated, and binary content
- **Pattern precision**: More specific regex patterns for common secrets
- GitIgnore risk assessment
- Template/example file exclusion

### 🛡️ False Positive Reduction

The analyzer employs multiple layers of false positive reduction:

#### File-Level Filtering
- **Binary files**: Comprehensive detection of executables, images, media files
- **Asset files**: Automatic exclusion of images, fonts, videos in asset directories
- **Lock files**: Enhanced detection including `bun.lockb`, `pnpm-lock.yaml`, etc.
- **Minified files**: Detection based on filename patterns and content analysis
- **Generated files**: Recognition of auto-generated and compiled files
- **SVG files**: Special handling for SVG files that often contain base64 data

#### Content-Level Analysis
- **Base64 detection**: Skips files with high base64 content ratio (except JWT tokens)
- **Data URLs**: Filters out `data:image/`, `data:font/`, `data:application/` content
- **Minified code**: Detects and skips minified JavaScript/CSS
- **Binary content**: Identifies binary data that passed initial filtering

#### Pattern-Level Improvements
- **Context-aware patterns**: Require proper assignment context (e.g., `api_key = "..."`)
- **Length requirements**: Minimum lengths for API keys to reduce false matches
- **Enhanced confidence scoring**: Multi-factor scoring based on context and content
- **False positive keywords**: Extensive list of example/placeholder indicators
- **Template literal detection**: Automatically excludes JavaScript/TypeScript template literals (`${...}`)
- **React/JSX awareness**: Special handling for React components and JSX files
- **Code generation detection**: Identifies files that generate example code snippets

#### Example Exclusions
The analyzer now intelligently skips:
```
❌ package-lock.json, yarn.lock, bun.lockb (dependency hashes)
❌ image.svg (base64 encoded graphics)
❌ bundle.min.js (minified code)
❌ font.woff2, icon.png (asset files)
❌ // TODO: replace with your API key (comments/examples)
❌ data:image/png;base64,iVBOR... (data URLs)
❌ Generated by webpack (auto-generated files)
❌ APICodeDialog.jsx (React components that generate example code)
❌ Authorization: "Bearer ${selectedApiKey?.apiKey}" (template literals)
❌ const code = `API_KEY = "${apiKey}"` (code generation functions)
```

## Usage

### Integration with CLI

The turbo analyzer is integrated into the main security command:

```bash
# Fast security scan with false positive reduction
sync-ctl security /path/to/project

# Include low severity findings (thorough mode)
sync-ctl security --include-low /path/to/project

# Skip secret detection (lightning mode)
sync-ctl security --no-secrets /path/to/project
```

### Scan Modes

The analyzer automatically chooses the best mode based on your flags:

- **Lightning**: Critical files only (.env, configs), basic patterns
- **Fast**: Smart sampling, priority patterns, skip large files  
- **Balanced**: Good coverage with performance optimizations (default)
- **Thorough**: Full scan with all patterns (still optimized)
- **Paranoid**: Everything including low-severity findings

## Architecture

### Core Components

```
┌─────────────────────┐
│  File Discovery     │ ← Git-aware, comprehensive filtering
└──────────┬──────────┘
┌──────────▼──────────┐
│ Content Analysis    │ ← Binary/minified/generated detection
└──────────┬──────────┘
┌──────────▼──────────┐
│ Priority Scoring    │ ← Critical files first
└──────────┬──────────┘
┌──────────▼──────────┐
│  Pattern Engine     │ ← Context-aware Aho-Corasick matching
└──────────┬──────────┘
┌──────────▼──────────┐
│ Confidence Scorer   │ ← Multi-factor false positive reduction
└──────────┬──────────┘
┌──────────▼──────────┐
│  Parallel Scanner   │ ← Work-stealing threads
└──────────┬──────────┘
┌──────────▼──────────┐
│   Result Cache      │ ← Concurrent caching
└──────────┬──────────┘
┌──────────▼──────────┐
│  Report Generator   │ ← Aggregation & scoring
└─────────────────────┘
```

### Excluded File Types

The analyzer automatically excludes these file types to reduce false positives:

**Binary & Media Files:**
- Images: `.jpg`, `.png`, `.gif`, `.svg`, `.ico`, `.webp`, etc.
- Media: `.mp3`, `.mp4`, `.avi`, `.mov`, `.wav`, etc.
- Fonts: `.ttf`, `.otf`, `.woff`, `.woff2`, `.eot`
- Documents: `.pdf`, `.doc`, `.xls`, `.ppt`, etc.

**Build & Dependencies:**
- Lock files: `package-lock.json`, `yarn.lock`, `pnpm-lock.yaml`, `bun.lockb`, `cargo.lock`, etc.
- Build outputs: `dist/`, `build/`, `.next/`, `.nuxt/`, `.output/`
- Minified files: `*.min.js`, `*.bundle.js`, `*.chunk.js`

**Development & Tooling:**
- IDE files: `.vscode/`, `.idea/`, `.vs/`
- OS files: `.DS_Store`, `Thumbs.db`
- Generated files: `*.generated.*`, `*.d.ts`
- Source maps: `*.map`

### Pattern Categories

- **Secrets**: API keys, passwords, tokens (with context requirements)
- **Environment Variables**: Sensitive config values
- **Cryptographic Material**: Private keys, certificates
- **Cloud Credentials**: AWS, GCP, Azure keys (with proper formatting)
- **Database Connections**: Connection strings with credentials

## Performance

Typical performance improvements over traditional scanning:

- **Lightning Mode**: 50-100x faster (critical files only)
- **Fast Mode**: 20-50x faster (smart sampling)
- **Balanced Mode**: 10-25x faster (default, good coverage)
- **Thorough Mode**: 5-10x faster (comprehensive scan)

**False Positive Reduction**: 80-95% reduction in false positives compared to basic pattern matching.

## Implementation Details

### Enhanced File Discovery

```rust
// Comprehensive file type detection
let binary_extensions = ["exe", "dll", "jpg", "png", "gif", "svg", "mp4", "pdf", ...];
let asset_extensions = ["jpg", "png", "svg", "ttf", "woff", "mp3", ...];
let lock_files = ["package-lock.json", "yarn.lock", "bun.lockb", ...];

// Content-based filtering
if is_minified_content(content) || is_generated_content(content) {
    skip_file();
}
```

### Context-Aware Pattern Matching

```rust
// Require assignment context for API keys
pattern: r#"(?i)api[_-]?key\s*[:=]\s*["']([A-Za-z0-9_-]{32,})["']"#

// Multi-factor confidence scoring
confidence = base_confidence
    + context_boost    // Assignment, environment variables
    + pattern_boost    // Specific keywords
    - false_positive_penalty  // Examples, comments
```

### Advanced Content Analysis

```rust
// Skip high base64 content (except JWT)
let base64_ratio = base64_chars / total_chars;
if base64_ratio > 0.7 && !content.contains("eyJ") {
    skip_content();
}

// Detect generated files
if content.contains("auto-generated") || content.contains("do not edit") {
    skip_file();
}
```

## Contributing

The turbo analyzer is designed for extensibility:

- Add new pattern sets in `pattern_engine.rs`
- Extend file discovery logic in `file_discovery.rs` 
- Implement additional scanners in `scanner.rs`
- Improve false positive detection in confidence scoring

## Recent Improvements

### v2.1 - Enhanced False Positive Reduction
- **Comprehensive file filtering**: Added support for 50+ file types and patterns
- **Content analysis**: Advanced detection of minified, generated, and binary content
- **Pattern precision**: More specific regex patterns requiring proper context
- **Confidence scoring**: Multi-factor scoring system with context awareness
- **React/JSX support**: Special handling for template literals and code generation components
- **Performance**: 3-5x faster scanning with 80-95% fewer false positives

### React/JSX Specific Improvements
- **Template literal detection**: Automatically recognizes `${variable}` patterns as non-secrets
- **Code generation files**: Identifies React components that generate API examples
- **Component patterns**: Understands `props.apiKey`, `state.token`, and similar patterns
- **Example code filtering**: Skips files with names like `APICodeDialog`, `CodeSnippet`, etc.
- **Context awareness**: Reduces confidence for patterns in React components and JSX files

### Lock File Support
- `bun.lockb` (Bun binary lock files)
- `pnpm-lock.yaml` (PNPM lock files)
- `pdm.lock` (Python PDM)
- `swift.resolved` (Swift Package Manager)
- `flake.lock` (Nix flakes)

## License

Same as the parent project.