threatdeflect-core
High-performance secret detection, confidence scoring, and IOC extraction engine written in Rust.
What it does
threatdeflect-core scans source code and text files looking for:
- Leaked secrets — AWS keys, GitHub tokens, API keys, database connection strings, private keys, and 30+ other credential patterns
- Suspicious commands — reverse shells, crypto miners, encoded payload execution, unsafe deserialization
- Indicators of Compromise (IOCs) — URLs, IPs, and domains extracted from code, including base64-encoded hidden IOCs
- Paste service C2 — URLs pointing to npoint.io, pastebin.com, and other paste services commonly used as C2 staging
Unlike simple regex scanners, it uses confidence scoring to reduce false positives:
| Signal | Effect |
|---|---|
| Shannon entropy > 5.5 | Confidence +10% (likely real secret) |
| Shannon entropy < 3.5 | Confidence -20% (likely placeholder) |
Placeholder detected (changeme, xxx, TODO) |
Confidence forced to 5% |
Assignment context (key = "...") |
Confidence +10% |
Test file (test_*.py, *_test.go) |
Confidence x0.3 |
| Example/template file | Confidence x0.15-0.2 |
| Production file | Confidence x1.0 (no penalty) |
Quick start
Add to your Cargo.toml:
[]
= "0.1"
Minimal example
use SecretAnalyzer;
Output:
[85%] AWS Key in src/config.py (Production)
IOC: http://evil.com/steal (from src/config.py)
Scanning multiple files
use ;
Filtering by confidence
// High confidence findings only (likely real secrets)
let real_secrets: = result.findings.iter
.filter
.collect;
// Low confidence findings (send to manual review or AI validation)
let needs_review: = result.findings.iter
.filter
.collect;
// Auto-discarded (almost certainly false positives)
let discarded: = result.findings.iter
.filter
.collect;
Serialization with serde
All types implement Serialize and Deserialize:
let result = analyzer.analyze_content;
// JSON output
let json = to_string_pretty?;
println!;
// Or use with any serde-compatible format (YAML, TOML, MessagePack, etc.)
File context classification
The engine automatically classifies files to adjust confidence:
use classify_file_context;
use FileContext;
assert_eq!;
assert_eq!;
assert_eq!;
assert_eq!;
assert_eq!;
Entropy calculation
Use the entropy function directly for custom analysis:
use calculate_entropy;
let high = calculate_entropy; // ~4.7
let low = calculate_entropy; // ~0.0
Architecture
threatdeflect-core/
analyzer.rs SecretAnalyzer: orchestrates all detection passes
confidence.rs Shannon entropy, base confidence, context adjustments
context.rs File classification, comment detection, IOC validation
types.rs Finding, Ioc, AnalysisResult, FileContext
error.rs AnalyzerError with thiserror
lib.rs Public re-exports
Detection pipeline (per file):
Input (content, path, filename)
|
|-- 1. Secret patterns regex match + confidence scoring -> Finding
|-- 2. Suspicious patterns regex match (skip safe contexts) -> Finding
|-- 3. High entropy strings entropy > 5.2 in code files -> Finding
|-- 4. Base64 IOC extraction decode base64 -> extract hidden URLs -> Ioc
|-- 5. JS keyword detection eval, innerHTML, unescape -> Finding
|
v
AnalysisResult { findings, iocs }
Detection capabilities
Secret patterns (30+)
AWS keys, GitHub/GitLab tokens, Slack/Discord tokens, Stripe keys, Google Cloud API keys, Firebase server keys, Azure storage keys, DigitalOcean tokens, Telegram/Discord bot tokens, NPM/PyPI tokens, database connection strings, Supabase keys, SSH private keys, JWTs, and more.
Suspicious commands
Reverse shells, crypto mining, JNDI injection (Log4Shell), encoded payload execution, Docker socket mounts, SSH key injection, crontab injection, unsafe deserialization, remote code loading, paste service C2 URLs.
IOC extraction
- Direct URLs from source code (filtered: localhost, internal, CDN, package registries)
- Base64-encoded URLs (decoded and extracted automatically)
- Paste service URLs flagged as potential C2
Performance
The engine is designed for scanning thousands of files in repositories:
- Zero-copy regex matching with the
regexcrate - Single-pass line scanning (all detection in one iteration)
- No heap allocation for file context classification
- No I/O: accepts
&strcontent, caller controls file reading
Typical throughput: ~50k lines/second on a single core (depends on rule count).
Python bindings
This crate powers the Python package ThreatDeflect via PyO3 + maturin. The Python wrapper adds:
- GitHub/GitLab repository cloning and traversal
- API integrations (VirusTotal, AbuseIPDB, Shodan)
- AI-powered finding validation
- PDF/Excel report generation
- Finding correlation (eval + external URL = severity boost)
License
GPL-3.0 — see LICENSE for details.