threatdeflect-core

High-performance secret detection, confidence scoring, and IOC extraction engine written in Rust.

What it does

threatdeflect-core scans source code and text files looking for:

Leaked secrets — AWS keys, GitHub tokens, API keys, database connection strings, private keys, and 30+ other credential patterns
Suspicious commands — reverse shells, crypto miners, encoded payload execution, unsafe deserialization
Indicators of Compromise (IOCs) — URLs, IPs, and domains extracted from code, including base64-encoded hidden IOCs
Paste service C2 — URLs pointing to npoint.io, pastebin.com, and other paste services commonly used as C2 staging

Unlike simple regex scanners, it uses confidence scoring to reduce false positives:

Signal	Effect
Shannon entropy > 5.5	Confidence +10% (likely real secret)
Shannon entropy < 3.5	Confidence -20% (likely placeholder)
Placeholder detected (`changeme`, `xxx`, `TODO`)	Confidence forced to 5%
Assignment context (`key = "..."`)	Confidence +10%
Test file (`test_.py`, `_test.go`)	Confidence x0.3
Example/template file	Confidence x0.15-0.2
Production file	Confidence x1.0 (no penalty)

Quick start

Add to your Cargo.toml:

[dependencies]
threatdeflect-core = "0.1"

Minimal example

use threatdeflect_core::SecretAnalyzer;

fn main() -> Result<(), Box<dyn std::error::Error>> {
    // Define detection rules: (rule_id, regex_pattern)
    let secret_rules = vec![
        ("AWS Key".to_string(), r"AKIA[0-9A-Z]{16}".to_string()),
        ("GitHub Token".to_string(), r"ghp_[a-zA-Z0-9]{36}".to_string()),
    ];

    // Suspicious command rules (excluded from safe contexts like imports)
    let suspicious_rules = vec![
        ("Reverse Shell".to_string(), r"bash\s+-i\s+>&\s+/dev/tcp".to_string()),
    ];

    let analyzer = SecretAnalyzer::new(secret_rules, suspicious_rules)?;

    let source_code = r#"
        aws_key = "AKIAIOSFODNN7EXAMPLE1"
        callback = "http://evil.com/steal"
    "#;

    let result = analyzer.analyze_content(source_code, "src/config.py", "config.py");

    for finding in &result.findings {
        println!(
            "[{:.0}%] {} in {} ({})",
            finding.confidence * 100.0,
            finding.finding_type,
            finding.file,
            finding.file_context.as_str(),
        );
    }

    for ioc in &result.iocs {
        println!("IOC: {} (from {})", ioc.ioc, ioc.source_file);
    }

    Ok(())
}

Output:

[85%] AWS Key in src/config.py (Production)
IOC: http://evil.com/steal (from src/config.py)

Scanning multiple files

use threatdeflect_core::{SecretAnalyzer, AnalysisResult};

fn scan_directory(
    analyzer: &SecretAnalyzer,
    root: &std::path::Path,
) -> AnalysisResult {
    let mut combined = AnalysisResult::new();

    for entry in walkdir::WalkDir::new(root)
        .into_iter()
        .filter_map(|e| e.ok())
        .filter(|e| e.file_type().is_file())
    {
        let path = entry.path();
        let rel_path = path.strip_prefix(root).unwrap_or(path);

        let content = match std::fs::read_to_string(path) {
            Ok(c) => c,
            Err(_) => continue,
        };

        let file_name = path
            .file_name()
            .and_then(|n| n.to_str())
            .unwrap_or("");

        let result = analyzer.analyze_content(
            &content,
            &rel_path.to_string_lossy(),
            file_name,
        );

        combined.merge(result);
    }

    combined
}

Filtering by confidence

// High confidence findings only (likely real secrets)
let real_secrets: Vec<_> = result.findings.iter()
    .filter(|f| f.confidence >= 0.70)
    .collect();

// Low confidence findings (send to manual review or AI validation)
let needs_review: Vec<_> = result.findings.iter()
    .filter(|f| f.confidence >= 0.15 && f.confidence < 0.70)
    .collect();

// Auto-discarded (almost certainly false positives)
let discarded: Vec<_> = result.findings.iter()
    .filter(|f| f.confidence < 0.15)
    .collect();

Serialization with serde

All types implement Serialize and Deserialize:

let result = analyzer.analyze_content(code, "app.py", "app.py");

// JSON output
let json = serde_json::to_string_pretty(&result)?;
println!("{}", json);

// Or use with any serde-compatible format (YAML, TOML, MessagePack, etc.)

File context classification

The engine automatically classifies files to adjust confidence:

use threatdeflect_core::context::classify_file_context;
use threatdeflect_core::FileContext;

assert_eq!(classify_file_context("src/config.py"), FileContext::Production);
assert_eq!(classify_file_context("tests/test_auth.py"), FileContext::Test);
assert_eq!(classify_file_context("examples/demo.rs"), FileContext::Example);
assert_eq!(classify_file_context("docs/setup.md"), FileContext::Documentation);
assert_eq!(classify_file_context("templates/base.html"), FileContext::Template);

Entropy calculation

Use the entropy function directly for custom analysis:

use threatdeflect_core::confidence::calculate_entropy;

let high = calculate_entropy("a8f2k9x1m3n7p4q6r8s2t5u0v3w1y9z");  // ~4.7
let low = calculate_entropy("aaaaaaaaaa");                           // ~0.0

Architecture

threatdeflect-core/
  analyzer.rs     SecretAnalyzer: orchestrates all detection passes
  confidence.rs   Shannon entropy, base confidence, context adjustments
  context.rs      File classification, comment detection, IOC validation
  types.rs        Finding, Ioc, AnalysisResult, FileContext
  error.rs        AnalyzerError with thiserror
  lib.rs          Public re-exports

Detection pipeline (per file):

Input (content, path, filename)
  |
  |-- 1. Secret patterns         regex match + confidence scoring -> Finding
  |-- 2. Suspicious patterns     regex match (skip safe contexts) -> Finding
  |-- 3. High entropy strings    entropy > 5.2 in code files -> Finding
  |-- 4. Base64 IOC extraction   decode base64 -> extract hidden URLs -> Ioc
  |-- 5. JS keyword detection    eval, innerHTML, unescape -> Finding
  |
  v
AnalysisResult { findings, iocs }

Detection capabilities

Secret patterns (30+)

AWS keys, GitHub/GitLab tokens, Slack/Discord tokens, Stripe keys, Google Cloud API keys, Firebase server keys, Azure storage keys, DigitalOcean tokens, Telegram/Discord bot tokens, NPM/PyPI tokens, database connection strings, Supabase keys, SSH private keys, JWTs, and more.

Suspicious commands

Reverse shells, crypto mining, JNDI injection (Log4Shell), encoded payload execution, Docker socket mounts, SSH key injection, crontab injection, unsafe deserialization, remote code loading, paste service C2 URLs.

IOC extraction

Direct URLs from source code (filtered: localhost, internal, CDN, package registries)
Base64-encoded URLs (decoded and extracted automatically)
Paste service URLs flagged as potential C2

Performance

The engine is designed for scanning thousands of files in repositories:

Zero-copy regex matching with the regex crate
Single-pass line scanning (all detection in one iteration)
No heap allocation for file context classification
No I/O: accepts &str content, caller controls file reading

Typical throughput: ~50k lines/second on a single core (depends on rule count).

Python bindings

This crate powers the Python package ThreatDeflect via PyO3 + maturin. The Python wrapper adds:

GitHub/GitLab repository cloning and traversal
API integrations (VirusTotal, AbuseIPDB, Shodan)
AI-powered finding validation
PDF/Excel report generation
Finding correlation (eval + external URL = severity boost)

License

GPL-3.0 — see LICENSE for details.

threatdeflect-core 0.1.3