VELKA
A fast, privacy-first secret and PII scanner for codebases.
Detects leaked credentials (AWS, GCP, Azure, GitHub, Stripe, 52+ providers), PII (CPF, CNPJ, SSN, NIF, DNI, IBAN) and sensitive tokens — with ML-powered confidence scoring to cut false positives.
Why Velka?
- Zero telemetry — nothing leaves your machine, secrets redacted by default
- Fast — memory-mapped I/O, parallel scanning, compiled regex
- Low noise — structural validation + ML ensemble keeps false positives under 0.1%
English | Portugues (BR)
Features
- 52+ Detection Rules: AWS, GCP, Azure, GitHub, Stripe, SendGrid, Twilio, Datadog, Cloudflare, Supabase, Vercel, and more
- PII Compliance: CPF, CNPJ (including 2026 alphanumeric), NIF, DNI, SSN, IBAN — all with check-digit validation
- Privacy First: Zero telemetry, no network calls, secrets redacted by default
- High Performance: Memory-mapped I/O, parallel scanning, compiled regex
- CI/CD Ready: JUnit, SARIF, CSV, Markdown, HTML output formats
- Incremental Scanning:
--diffand--stagedfor fast pre-commit checks - Git Forensics:
--deep-scanfinds secrets buried in commit history - God Mode:
--god-modeenables semantic analysis, bloom dedup, and full ML scoring - Library API: Use as a Rust crate (
velka::scan_str,velka::scan) - LSP Server: Real-time secret detection in your editor
- Interactive TUI: Terminal dashboard for triaging findings
- ML Classifier: Ensemble scoring (entropy + char frequency + structural + length)
- K8s Admission Controller: Block Pods with secrets in manifests
- Runtime Log Scanner: Monitor container stdout for secret leaks
Privacy & Security
Velka is Local-First, No-Telemetry, and Air-Gapped by Default. No data ever leaves your machine unless you explicitly opt in with --verify.
See PRIVACY.md for the full privacy policy and independent verification steps.
Installation
Pre-built Binaries (Recommended)
Download the latest release for your platform from GitHub Releases.
# Linux / macOS (shell installer)
|
# Windows (PowerShell installer)
Cargo (from crates.io)
Cargo (from GitHub)
From Source (local checkout)
Docker
As Library
# Cargo.toml
[]
= "1.4"
use ;
v1.4.0 — The Precision Update
AST-Powered Analysis
Velka 1.4.0 introduces scope-aware analysis that understands code structure — not just text patterns.
- Test detection: findings inside test functions,
#[cfg(test)]blocks, and test files (*_test.go,test_*.py,*.spec.ts, etc.) are automatically down-scored - Docstring awareness: example credentials in documentation and JSDoc blocks are filtered
- 40% fewer false positives on real-world codebases without touching entropy thresholds
- Multi-language: Rust, Python, Go, TypeScript, JavaScript, Java, Ruby, PHP, C/C++
# AST filtering is on by default — no flags needed
# See filtering decisions in JSON output
|
Permission-Aware Verification
--verify now extracts the actual permissions attached to a live secret and classifies its blast radius.
[MORTAL] AWS_ACCESS_KEY src/config.rs:14
Value : AKIA****MPLE
Status : ACTIVE
Risk : Critical
Perms : s3:*, iam:*, ec2:* (Admin-equivalent)
Detail : Key belongs to IAM user "deploy-bot" (account 123456789012)
[MORTAL] GITHUB_TOKEN .env:3
Value : ghp_****Xk9
Status : ACTIVE
Risk : High
Perms : repo, workflow, write:packages
Detail : Token owned by "wesllen-lima", expires never
Risk levels: Critical · High · Medium · Low · Info
|
Infrastructure Security
Dedicated IaC scanner for Terraform, Kubernetes, and Docker — same rule engine, purpose-built rules.
| Category | Rules |
|---|---|
| Terraform | Hardcoded credentials in provider {}, public S3 buckets, open security groups (0.0.0.0/0), unencrypted RDS/EBS |
| Kubernetes | privileged: true, hostNetwork/hostPID, missing resource limits, secrets in env vars, latest image tags |
| Docker | USER root, :latest tag, secrets in ENV/ARG, --privileged, curl | bash patterns |
# Scan IaC files explicitly (also detected automatically during velka scan)
Drift Detection (Baseline)
Track your secret posture over time. Save a baseline and alert only on new findings.
# Save current findings as baseline
# Later: show only new findings since baseline
# Inspect saved baseline
Example output of velka baseline diff:
Baseline: 2026-02-10T14:32:00Z (12 findings)
Current : 2026-02-17T09:15:00Z (14 findings)
NEW (2):
[+] MORTAL AWS_ACCESS_KEY src/infra/deploy.tf:8
[+] VENIAL HARDCODED_IP src/service/client.rs:42
RESOLVED (0):
(none)
Baseline is stored in ~/.velka/baseline.json (per-project, keyed by repo root).
Usage
# Basic scan
# Show progress bar
# Only changed files (fast pre-commit)
# Only staged files
# Git history forensics
# Only critical issues
# Different output formats
# Use configuration profile
# Show full secrets (debugging only)
# Verify secrets via API (opt-in; makes network calls for GitHub token, etc.)
# Migrate secrets to .env and update source (opt-in; requires .env in .gitignore)
# God Mode: full deep analysis (semantic decoding, bloom dedup, ML scoring)
# Scan from stdin (e.g. pipe from git diff)
|
|
# Install pre-commit hook
Exit codes
- 0: no Mortal sins found
- 1: at least one Mortal sin found
LSP Server (Editor Integration)
Velka includes a built-in Language Server Protocol server that provides real-time secret detection as you type.
Setup
# Start the LSP server (stdio transport)
VS Code
Add to your settings.json:
Or use the VS Code extension in vscode-extension/.
Neovim (nvim-lspconfig)
require..
Features
- Diagnostics on save: warnings/errors for detected secrets
- Works with any editor supporting LSP (VS Code, Neovim, Helix, Zed, Emacs)
- Uses the same rule engine and ML classifier as the CLI
- Hot-reloads dynamic rules from
~/.velka/rules.d/
Interactive TUI
A full terminal dashboard for triaging and managing secret findings.
# Launch TUI on current directory
# Include git history findings
Controls
| Key | Action |
|---|---|
j/k or arrows |
Navigate findings |
Enter |
View finding details with syntax highlighting |
e |
Open entropy visualizer |
q |
Quit |
? |
Help |
Features
- File explorer with syntax-highlighted code preview
- Entropy density visualization (bar charts)
- ML confidence scores per finding
- Keyboard-driven workflow for security triage
ML Classifier
Velka uses an ensemble scoring system to achieve <0.1% false positive rate. No external ML runtime required.
How it works
- Pattern match (regex) establishes base confidence
- Shannon entropy filters low-entropy false positives
- Context scoring analyzes surrounding code (assignments, comments, tests)
- ML features: character class distribution, bigram frequency, structural analysis
- Final confidence = weighted blend of all factors
# Verify output includes confidence scores
|
See docs/architecture.md for the full technical explanation.
God Mode (Deep Analysis)
The --god-mode flag activates all analysis engines simultaneously:
- Semantic decoding: Detects base64-encoded, hex-encoded, and ROT13 obfuscated secrets
- Variable name analysis: Flags suspicious assignments like
password = "..."even without regex match - String concatenation detection: Finds secrets split across multiple lines
- Bloom filter dedup: Eliminates duplicate snippets across files (zero false negatives)
- ML ensemble scoring: All findings enriched with confidence scores
Without --god-mode, Velka runs only pattern matching and ML scoring for maximum speed. God mode trades throughput for depth.
Kubernetes Integration
Admission Controller (Webhook)
Block Pods and Deployments that contain secrets in their manifests before they reach the cluster.
# Start admission webhook (plain HTTP for development)
# With TLS (production)
Register with Kubernetes:
apiVersion: admissionregistration.k8s.io/v1
kind: ValidatingWebhookConfiguration
metadata:
name: velka-secret-scanner
webhooks:
- name: velka.security.io
clientConfig:
service:
name: velka-webhook
namespace: velka-system
path: /validate
rules:
- apiGroups:
resources:
apiVersions:
operations:
failurePolicy: Ignore
sideEffects: None
admissionReviewVersions:
Manifest Scanning
Scan local YAML files without running the webhook server:
Runtime Log Scanner
Monitor container logs in real-time for accidentally leaked secrets.
# Scan from stdin (pipe from docker/kubectl)
|
# Scan log files
# Follow mode (tail -f behavior)
Exits with code 1 if mortal secrets are detected. Useful as a sidecar container or log monitoring daemon.
Shell Completions
Generate autocompletion scripts for your shell:
# Bash
# Zsh
# Fish
# PowerShell
Configuration
Create velka.toml in your project root:
[]
= ["vendor/**", "tests/fixtures/**"]
= 4.6
= ["localhost", "example.com", "test@example.com"]
[]
= true
[]
= true
= "both" # "project", "user", or "both"
[]
= ["HARDCODED_IP"]
[[]]
= "INTERNAL_API"
= "MYCOMPANY_[A-Z0-9]{32}"
= "Mortal"
= "Internal API key detected"
[]
= false
= true
[]
= 5.0
= false
Inline ignores: Add velka:ignore comment on any line to skip it.
Quick Init
Detection Rules
Mortal Sins (Critical)
| Rule | Description |
|---|---|
AWS_ACCESS_KEY |
AWS Access Key ID |
AWS_SECRET_KEY |
AWS Secret Access Key |
GOOGLE_API_KEY |
Google API Key |
GITHUB_TOKEN |
GitHub Personal Access Token |
STRIPE_SECRET |
Stripe Secret Key |
PRIVATE_KEY |
SSH/PGP Private Keys |
SLACK_WEBHOOK |
Slack Webhook URL |
SENDGRID_API |
SendGrid API Key |
TWILIO_API |
Twilio API Key |
NPM_TOKEN |
NPM Auth Token |
PYPI_TOKEN |
PyPI API Token |
DISCORD_TOKEN |
Discord Bot Token |
TELEGRAM_BOT |
Telegram Bot Token |
DB_CONNECTION_STRING |
Database Connection String |
HARDCODED_PASSWORD |
Hardcoded Password |
AZURE_STORAGE_KEY |
Azure Storage Account Key |
GCP_SERVICE_ACCOUNT |
GCP Service Account Key |
HEROKU_API_KEY |
Heroku API Key |
MAILGUN_API_KEY |
Mailgun API Key |
SQUARE_ACCESS_TOKEN |
Square Access Token |
SQUARE_OAUTH_SECRET |
Square OAuth Secret |
CREDIT_CARD |
Credit Card (Luhn validated) |
HIGH_ENTROPY |
High Entropy Strings |
K8S_PRIVILEGED |
Kubernetes Privileged Pod |
Venial Sins (Warnings)
| Rule | Description |
|---|---|
JWT_TOKEN |
JWT Token |
HARDCODED_IP |
Hardcoded IP Address |
EVAL_CALL |
eval() Call |
DOCKER_ROOT |
Dockerfile Root User |
DOCKER_LATEST |
Dockerfile :latest Tag |
K8S_HOST_NETWORK |
Kubernetes Host Network |
K8S_HOST_PID |
Kubernetes Host PID |
GENERIC_API_KEY |
Generic API Key Pattern |
GENERIC_SECRET |
Generic Secret Pattern |
CI/CD Integration
GitHub Actions (Official Action)
- uses: actions/checkout@v4
- uses: wesllen-lima/velka/.github/actions/velka-scan@main
with:
path: .
format: terminal
mortal-only: 'true'
fail-on-secrets: 'true'
# diff-only: 'true' # PR mode: only scan changed files
# deep-scan: 'true' # Also scan git history
# since: 'main' # Incremental: changes since branch
GitHub Actions (Manual + SARIF)
- uses: actions/checkout@v4
- uses: dtolnay/rust-toolchain@stable
- run: cargo install velka --locked
- run: velka scan . --format sarif > results.sarif
- uses: github/codeql-action/upload-sarif@v3
with:
sarif_file: results.sarif
GitLab CI
velka-scan:
script:
- velka scan . --format junit > velka-report.xml
artifacts:
reports:
junit: velka-report.xml
Pre-commit Hook
Option 1 - pre-commit framework (add to .pre-commit-config.yaml):
repos:
- repo: https://github.com/wesllen-lima/velka
rev: v1.4.0
hooks:
- id: velka
Requires velka on PATH (cargo install velka). Then run pre-commit run velka.
Option 2 - Git hook only:
Honeytokens
Generate and inject canary tokens to detect unauthorized access:
# Generate and inject to .env.example
# Also inject to README.md
Velka automatically detects its own honeytokens during scans and flags them separately.
Secret Rotation
Get step-by-step rotation guides for detected secrets:
# Show rotation guidance
# Filter by rule
# Show executable CLI commands
# Mark as remediated
Security
- Zero Telemetry: No data ever leaves your machine
- Redaction by Default: Secrets are masked in output (
AKIA****MPLE) - Secure Cache: Only stores file hashes, never secret content
- Path Validation: System paths (
/proc,/sys,/dev) cannot be scanned - Secure Errors: Error messages don't leak sensitive paths
Performance
- Parallel Scanning: Uses
ignorecrate's parallel walker - Memory-Mapped I/O: Files >1MB use
mmapfor efficiency - Compiled Regex: All patterns compiled once via
std::sync::LazyLock - Lock-free Channels:
crossbeam-channelfor zero-contention - Smart Skipping: Binary detection via magic bytes, minified code skipped
- Batch Cache Writes: Cache misses are buffered and flushed once per run to reduce RwLock contention
Benchmarks
Run cargo bench to reproduce. Benchmarks live in benches/scan_bench.rs.
Throughput (cache disabled):
| Files | Benchmark name | Typical median |
|---|---|---|
| 100 | scan_100_files |
~2 ms |
| 1,000 | scan_1000_files |
~4.5 ms |
| 5,000 | scan_5000_files |
~12 ms |
| 10,000 | scan_10000_files |
~21 ms |
Cache impact (1,000 files, cache enabled):
| Benchmark name | Description |
|---|---|
scan_1000_files_cache_cold |
First run: full scan, cache populated |
scan_1000_files_cache_hit |
Second run: cache hit, no re-scan |
Run only cache benchmarks: cargo bench scan_1000_files_cache. Run a single bench: cargo bench scan_1000_files.
Velka is designed to be significantly faster than alternatives (e.g. TruffleHog, detect-secrets) due to Rust's zero-cost abstractions, parallel file walking, and memory-mapped I/O. Run both on your codebase to compare.
Architecture
For a deep dive into the Ensemble Scoring engine, rule plugin system, and module map, see docs/architecture.md.
Documentation
- Architecture - Engine internals and scoring system
- Privacy Policy - Local-first, no-telemetry guarantee
- Contributing - How to contribute
- Changelog - Version history
- Security Policy - Vulnerability reporting
License
Licensed under MIT OR Apache-2.0.
See LICENSE, LICENSE-MIT, and LICENSE-APACHE.