vsec 0.0.1

Detect secrets and in Rust codebases
Documentation
# Vsec

A research-grade static analysis tool for detecting hardcoded secrets in Rust codebases with intelligent false-positive filtering.

## Features

- **Two-Pass Architecture**: Indexes constants across files, then analyzes usage patterns
- **Four-Layer Filter Pipeline**: Dramatically reduces false positives through:
  - Layer 1: Name analysis (benign vs suspicious terms)
  - Layer 2: Scope isolation (test/example/benchmark detection)
  - Layer 3: Consequence analysis (what happens if comparison succeeds)
  - Layer 4: RHS analysis (command routing vs authentication)
- **Score-Based Detection**: 0-100+ scoring with configurable thresholds
- **Multiple Output Formats**: Text, JSON, SARIF, Markdown, GitHub Actions
- **Parallel Processing**: Uses rayon for fast scanning of large codebases
- **Configurable Sensitivity**: From paranoid (catch everything) to minimal (high confidence only)
- **Git History Scanning**: Find deleted secrets in commit history with `--scan-history`

## Installation

```bash
cargo install --path .
```

Or build from source:

```bash
cargo build --release
```

## Usage

### Scan a Directory

```bash
# Scan directory
vsec scan --scan-history --config .vsec.toml ../rustfs

# Scan with lower threshold (more findings)
vsec scan --scan-history --config .vsec.toml --threshold 50

# Include test and example files
vsec scan --scan-history --config .vsec.toml --include-tests --include-examples

# Output as JSON
vsec scan --scan-history --config .vsec.toml -f json -o results.json

# Output as SARIF (for CI integration)
vsec scan --scan-history --config .vsec.toml -f sarif -o results.sarif

# Fail CI if findings exist
vsec scan --scan-history --config .vsec.toml --fail-on-findings
```

### Initialize Configuration

```bash
# Create default config file
vsec init

# Create minimal config
vsec init --minimal

# Specify output path
vsec init -o my-config.toml
```

### Explain a Finding

```bash
# Get details about a specific finding
vsec explain SEC-abc123 --results results.json
```

### Debug Mode

```bash
# Show parsing details for a file
vsec debug src/auth.rs

# Include AST dump
vsec debug src/auth.rs --show-ast
```

### Git History Scanning

Scan git history to find secrets that were committed and later deleted:

```bash
# Scan git history (requires --scan-history flag)
vsec scan --scan-history

# Limit to last 100 commits
vsec scan --scan-history --max-commits 100

# Only scan commits after a specific date
vsec scan --scan-history --since 2024-01-01

# Combine with other options
vsec scan --scan-history -f sarif -o history-results.sarif
```

> **Note**: Git history scanning is opt-in because it can be slow on large repositories with many commits. Use it for security audits, not daily CI.

## Configuration

Create a `.vsec.toml` file in your project root:

```toml
[general]
# Sensitivity preset: "paranoid", "high", "normal", "low", or a number
sensitivity = "normal"

[scan]
# Patterns to ignore
ignore_paths = ["target", "vendor", ".git"]

# Include test files in scan
include_tests = false

# Include example files in scan
include_examples = false

[output]
# Show score breakdown for each finding
show_scores = false

# Show remediation suggestions
show_remediation = true
```

## Sensitivity Presets

| Preset | Threshold | Use Case |
|--------|-----------|----------|
| `paranoid` | 40 | Security audits, pre-release reviews |
| `high` | 55 | CI/CD with manual review |
| `normal` | 70 | Regular development (default) |
| `low` | 85 | Large codebases, automated blocking |

## How It Works

### Detection Strategy

Secretrace focuses on **equality comparisons** involving string constants, not just constant definitions. This approach catches actual authentication bypass vulnerabilities:

```rust
// This pattern is detected:
const AUTH_TOKEN: &str = "sk_live_12345";

fn authenticate(token: &str) -> bool {
    if token == AUTH_TOKEN {  // <- Comparison triggers detection
        grant_access();
        return true;
    }
    false
}
```

### False Positive Filtering

The four-layer filter pipeline eliminates common false positives:

1. **Name Analysis**: `VERSION`, `COLOR`, `HELP_TEXT` are benign
2. **Scope Isolation**: Test files, examples, and benchmarks are filtered
3. **Consequence Analysis**: Logging-only blocks are deprioritized
4. **RHS Analysis**: Command routing (`if cmd == "help"`) is filtered

### Scoring Factors

Findings are scored based on multiple factors:

| Factor | Impact |
|--------|--------|
| Suspicious name (password, secret, key) | +20 to +40 |
| High entropy value | +20 |
| Auth consequence (grant_access, authorize) | +25 |
| Long value (32+ chars) | +15 |
| Benign name (version, color, help) | -100 (killed) |
| Test context | -100 (killed) |
| Short value (<8 chars) | -20 |
| Placeholder pattern | -50 |

## Output Formats

### Text (default)
Human-readable output with colors and formatting.

### JSON
Machine-readable format for tooling integration.

### SARIF
Static Analysis Results Interchange Format for CI/CD integration (GitHub, Azure DevOps, etc.)

### Markdown
Report format suitable for documentation or PR comments.

### GitHub
GitHub Actions annotation format for inline PR comments.

## CI Integration

### GitHub Actions

```yaml
- name: Scan for secrets
  run: |
    cargo install vsec
    vsec scan -f sarif -o results.sarif --fail-on-findings

- name: Upload SARIF
  uses: github/codeql-action/upload-sarif@v2
  with:
    sarif_file: results.sarif
```

### Pre-commit Hook

```bash
#!/bin/bash
vsec scan --fail-on-findings -t 70
```

## Architecture

```
┌─────────────────────────────────────────────────────────────┐
│                        Scanner                               │
├─────────────────────────────────────────────────────────────┤
│  Phase 1: Indexer                                           │
│  - Parallel file parsing (rayon + syn)                      │
│  - Build SuspectRegistry (DashMap)                          │
│  - Index public constants                                   │
├─────────────────────────────────────────────────────────────┤
│  Phase 2: Analyzer                                          │
│  - Find equality comparisons                                │
│  - Resolve constant references                              │
│  - Run filter pipeline                                      │
│  - Score findings                                           │
├─────────────────────────────────────────────────────────────┤
│  Filter Pipeline                                            │
│  ┌─────────┐ ┌─────────┐ ┌─────────────┐ ┌────────────┐     │
│  │ Layer 1 │→│ Layer 2 │→│   Layer 3   │→│  Layer 4   │     │
│  │  Names  │ │  Scope  │ │ Consequence │ │    RHS     │     │
│  └─────────┘ └─────────┘ └─────────────┘ └────────────┘     │
├─────────────────────────────────────────────────────────────┤
│  Scoring Engine                                             │
│  - Entropy analysis                                         │
│  - Pattern matching                                         │
│  - Factor aggregation                                       │
│  - Threshold application                                    │
└─────────────────────────────────────────────────────────────┘
```

## License

Vsec is dual-licensed under either:

- **MIT License** ([LICENSE-MIT]LICENSE-MIT or http://opensource.org/licenses/MIT)
- **Apache License, Version 2.0** ([LICENSE-APACHE]LICENSE-APACHE or http://www.apache.org/licenses/LICENSE-2.0)

at your option.