$ keyhog scan --path .
██ ██ ████████ ██ ██ ██ ██ ██████ ██████
██ ██ ██ ██ ██ ██ ██ ██ ██ ██
█████ █████ ████ ███████ ██ ██ ██ ███
██ ██ ██ ██ ██ ██ ██ ██ ██ ██
██ ██ ████████ ██ ██ ██ ██████ ██████
v1.0.0 · Secret Scanner · 886 detectors
by SanthSecurity
critical 82% ██████░░ GitHub Classic PAT
ghp_...7890 src/config.py:42
critical 78% █████░░░ Stripe Secret Key
sk_l...ab12 .env:7
critical 78% █████░░░ GitHub PAT (decoded from base64)
ghp_...7890 k8s/secret.yaml:12
3 secrets found · 2 unique credentials · 0 false positives
Why KeyHog
Most secret scanners run regex against plaintext. They miss anything encoded, embedded, or obfuscated. KeyHog doesn't.
Decode-through scanning recursively unwraps base64, hex, URL encoding, quoted-printable, and Unicode escapes before pattern matching — catching secrets buried in Kubernetes manifests, CI configs, Docker layers, and compiled artifacts that other tools never see.
ML confidence scoring uses a 3,969-parameter neural network trained on 200K real credentials to separate secrets from hashes, test fixtures, and documentation strings. Every finding comes with a 0–100% score. Zero false positives at the default 70% threshold.
Live verification hits real APIs (AWS, GitHub, Stripe, Slack, OpenAI, and more) to confirm whether a leaked credential is actually active.
Feature Comparison
| KeyHog | TruffleHog | Gitleaks | Semgrep | |
|---|---|---|---|---|
| Detectors | 886+ | 800+ | 150+ | Rules |
| Recall (blind test) | 98% | 32% | ~30% | ~40% |
| False positives | Zero | Moderate | Low | High |
| Base64 decode | ✓ | ✗ | ✗ | ✗ |
| Hex decode | ✓ | ✗ | ✗ | ✗ |
| ML scoring | ✓ (99.5%) | Partial | ✗ | ✗ |
| Live verify | ✓ | ✓ | ✗ | ✗ |
| Throughput | ~50 MB/s | ~10–30 | ~5–15 | ~20 |
| License | MIT | AGPL | MIT | LGPL |
KeyHog finds 74 credentials that TruffleHog misses. TruffleHog finds 0 that KeyHog misses.
Choosing Between Alternatives
- Use
KeyHogwhen you need high recall on encoded secrets, embeddable Rust crates, and optional live verification. - Use
TruffleHogwhen you prioritize its existing verification workflows over a lightweight Rust-native integration story. - Use
Gitleakswhen plaintext regex scanning is enough and you want a simpler rule engine. - Use
Semgrepwhen your main goal is broad static analysis rather than secret-specific recall.
Quick Start
# Install
# Scan a directory
# Scan with verification
# Scan a git repo's full history
# CI mode: only changed files, SARIF output
Install
# Install the published CLI
# Or build from source
Standalone Crates
[]
= "0.1.0"
= "0.1.0"
= "0.1.0"
= "0.1.0"
keyhog-coreprovides detector specs, findings, reporting, and allowlists.keyhog-scannercompiles detectors and scansChunkvalues.keyhog-sourcesprovides filesystem, stdin, git, Docker, S3, and binary inputs.keyhog-verifierverifies deduplicated findings asynchronously.keyhogis the end-user binary package.
Library Quick Start
use ;
use CompiledScanner;
let scanner = compile?;
let findings = scanner.scan;
assert_eq!;
# Ok::
Docker
GitHub Actions
- uses: keyhog/keyhog-action@v1
with:
path: .
min-confidence: 0.7
format: sarif
Pre-commit
repos:
- repo: https://github.com/santhsecurity/keyhog
rev: v0.1.0
hooks:
- id: keyhog-secret-scan
Usage
# Scan directory
# JSON output
# Only high-severity findings
# Scan last 5 commits
# Staged files only (for pre-commit)
# Custom confidence threshold
# Fail CI on any finding
Output Formats
| Format | Flag | Use for |
|---|---|---|
| Text | --format text |
Human reading (default) |
| JSON | --format json |
Programmatic use |
| JSONL | --format jsonl |
Streaming / log ingestion |
| SARIF | --format sarif |
GitHub code scanning |
Architecture
KeyHog uses a two-phase architecture built on Aho-Corasick automata:
Input Phase 1: Prefilter Phase 2: Confirm Score & Verify
───── ────────────────── ──────────────── ──────────────
┌───────────────────┐ ┌──────────────────┐ ┌────────────────┐
file │ Decode-Through │ │ Regex Confirm │ │ ML Classifier │
stdin ────▶ │ Aho-Corasick │────▶│ Match regions │────▶│ 3,969 params │
git │ O(n) single-pass │ │ per candidate │ │ 99.5% acc │
└───────────────────┘ └──────────────────┘ └───────┬────────┘
│
▼
┌────────────────┐
│ Live Verify │
│ (optional) │
│ async tokio │
└────────────────┘
Decode-Through Scanning
Before pattern matching, KeyHog recursively decodes:
- Base64 (standard + URL-safe)
- Hexadecimal
- URL encoding
- Quoted-printable
- Unicode escapes
# KeyHog catches this. Other scanners don't.
= # base64(ghp_...)
Structural Context
Same credential, different context, different confidence:
# 82% — production config
=
# 25% — test fixture (auto-detected via AST context)
=
Adding Detectors
Detectors are TOML — no code changes needed:
# detectors/my-service.toml
[]
= "my-service-api-key"
= "My Service API Key"
= "critical"
= ["ms_live_", "ms_test_"]
[[]]
= 'ms_(live|test)_[a-zA-Z0-9]{32}'
[]
= "GET"
= "https://api.myservice.com/v1/status"
[]
= "bearer"
= "match"
Configuration
.keyhog.toml
= "detectors" # Path to detector TOML files
= "medium" # Minimum: info | low | medium | high | critical
= "text" # Output: text | json | jsonl | sarif
= 0.7 # ML confidence threshold (0.0–1.0)
= 8 # Parallel scan threads
= "credential" # Dedup: credential | file | none
= true # Enable decode-through + entropy + multiline
= 10 # Verification timeout (seconds)
= false # Redact credentials in output
.keyhogignore
# Paths
path:tests/**
path:**/*.md
# Detectors
detector:entropy
detector:generic-api-key
# Specific findings by hash
hash:abc123def456
Inline suppression
# keyhog:ignore
=
# keyhog:ignore detector=github-token
=
# keyhog:ignore reason="public CI token"
=
Modular Builds
# Full build (default)
# Fast mode: regex-only, no ML/decode/multiline — for pre-commit hooks
# With live verification
Performance
All benchmarks: AMD Ryzen 9 5900X, 32 GB RAM, NVMe SSD.
Throughput
| Detectors | 1 MB | 10 MB | 100 MB |
|---|---|---|---|
| 100 | 55 MB/s | 58 MB/s | 62 MB/s |
| 500 | 48 MB/s | 52 MB/s | 56 MB/s |
| 886 | 42 MB/s | 46 MB/s | 50 MB/s |
Real-World Repos
| Repository | Size | KeyHog | TruffleHog | Gitleaks |
|---|---|---|---|---|
| facebook/react | 350 MB | 8s | 25s | 45s |
| denoland/deno | 900 MB | 18s | 55s | 95s |
| rust-lang/rust | 2.1 GB | 42s | 120s | 200s |
Verification Latency
| Service | Status | Latency |
|---|---|---|
| AWS | ✓ | ~200ms |
| GitHub | ✓ | ~150ms |
| Slack | ✓ | ~180ms |
| Stripe | ✓ | ~220ms |
| OpenAI | ✓ | ~250ms |
License
MIT — see LICENSE.