Skip to main content

Crate secret_scraper

Crate secret_scraper 

Source
Expand description

§SecretScraper

Rust library for crawling web targets, discovering URLs and JavaScript links, and detecting secrets (API keys, credentials, internal IPs, PII, and more) with configurable regular-expression rules. Also scans local files and directories recursively.

§Quick Start

Crawl a website with the built-in detection rules:

use secret_scraper::{
    cli::{Config, Mode},
    facade::{CrawlerFacade, ScanFacade, ScanResult},
};

let mut config = Config::default_with_rules();
config.url = Some("https://example.com".to_string());
config.mode = Mode::Thorough;
config.detail = true;
config.outfile = Some("crawl.csv".into());

match Box::new(CrawlerFacade::new(config).unwrap()).scan().unwrap() {
    ScanResult::CrawlResult(result) => {
        println!(
            "{} domains, {} URL groups, {} secret-bearing URLs",
            result.hosts.len(),
            result.urls.len(),
            result.secrets.len()
        );
    }
    ScanResult::LocalScanResult(_) => unreachable!(),
}

Scan a local directory recursively:

use secret_scraper::{
    cli::Config,
    facade::{FileScannerFacade, ScanFacade, ScanResult},
};

let mut config = Config::default_with_rules();
config.local = Some("./samples".into());
config.outfile = Some("local-scan.yml".into());

match Box::new(FileScannerFacade::new(config).unwrap()).scan().unwrap() {
    ScanResult::LocalScanResult(result) => {
        println!("{} files scanned", result.len());
        for (path, secrets) in &result {
            println!("{}: {} secrets", path.display(), secrets.len());
        }
    }
    ScanResult::CrawlResult(_) => unreachable!(),
}

§Features

  • Web crawling — crawl seed URLs with configurable depth, following HTML links, JavaScript sources, and regex-discovered URLs.
  • Local file scanning — scan a single file or walk a directory tree recursively for secrets.
  • Built-in secret rules — detects Swagger docs, ID cards, phone numbers, email addresses, internal IPs, cloud keys, Shiro keys, API keys, and more.
  • Custom rules — add your own regex patterns for URL discovery, JavaScript link extraction, and secret detection.
  • Domain filtering — allow-list or block-list domains with wildcard patterns (*.example.com).
  • Rate limiting — per-domain concurrency caps and minimum request intervals.
  • Proxy support — HTTP and SOCKS5 proxies.
  • Status filtering — filter displayed results by HTTP status codes or ranges.
  • Validation mode — verify discovered link statuses without crawling them.
  • Output formats — crawl results as CSV, local scan results as YAML.

§Configuration

Build a Config by starting from a default, then setting fields directly. The layering order used by the CLI (defaults → YAML → CLI flags) is available programmatically via apply_file_layer and apply_cli_layer, but for library usage you typically set fields directly on the struct.

Two constructors are available:

MethodDescription
Config::default()Empty rule lists — add your own rules.
Config::default_with_rules()Pre-populated with 5 URL-find, 3 JS-find, and 10 secret-detection rules.

Key configuration fields on Config:

FieldTypeDescription
urlOption<String>Single seed URL for crawling.
url_fileOption<PathBuf>Newline-delimited file of seed URLs.
localOption<PathBuf>File or directory for local scanning.
modeModeNormal (depth 1) or Thorough (depth 2).
max_depthOption<u32>Override crawl depth; 0 = seed URLs only.
max_pageOption<u32>Maximum pages to crawl (default 1000).
detailboolShow per-URL hierarchy in output.
validateboolValidate discovered link statuses.
follow_redirectboolFollow HTTP redirects.
hide_regexboolSuppress secret output.
outfileOption<PathBuf>Write results to file (CSV for crawl, YAML for scan).
timeoutDurationRequest timeout (default 30s).
proxyOption<String>Proxy URL (http://host:port or socks5://host:port).
user_agentOption<String>Override User-Agent header.
cookieOption<String>Set Cookie header.
allow_domainsOption<Vec<String>>Domain allow-list with wildcards.
disallow_domainsOption<Vec<String>>Domain block-list with wildcards.
max_concurrency_per_domainusizeConcurrent request cap per domain (default 50).
min_request_intervalDurationMinimum seconds between requests to same domain (default 200ms).
dangerous_pathsOption<Vec<String>>Path fragments to avoid requesting (e.g. logout, delete).
url_find_rulesVec<Rule>Regex rules for discovering URLs in response text.
js_find_rulesVec<Rule>Regex rules for discovering JavaScript URLs.
custom_rulesVec<Rule>Regex rules for secret detection.
custom_headersOption<HeaderMap>Extra HTTP headers sent with requests.
status_filterOption<StatusRangeRule>Filter output by response status.

§Custom Rules

Use Rule::new to compile a named regex:

use secret_scraper::cli::{Config, Rule};

let mut config = Config::default();
config.url_find_rules.push(
    Rule::new_with_group("api_path".into(), r#""(/api/v[0-9]+/[^"]+)""#, true).unwrap()
);
config.custom_rules.push(
    Rule::new("Custom Token".into(), r"TOKEN_[A-Z0-9]{16}").unwrap()
);

Rule::new emits the full regex match. Use Rule::new_with_group when capture groups should be emitted instead, which is usually what URL-discovery rules need.

Rules added via Config::default() start empty. When using Config::default_with_rules(), your custom rules are appended to the built-in lists.

§Result Handling

The high-level API uses ScanFacade::scan, which returns ScanStdResult — an alias for Result<ScanResult, SecretScraperError>.

use secret_scraper::{
    cli::Config,
    error::{Result as SsResult, SecretScraperError},
    facade::{FileScannerFacade, ScanFacade, ScanResult},
};

fn try_scan() -> SsResult<()> {
    let mut config = Config::default_with_rules();
    config.local = Some("./src".into());

    match Box::new(FileScannerFacade::new(config)?).scan() {
        Ok(ScanResult::LocalScanResult(files)) => {
            for (path, secrets) in &files {
                for s in secrets {
                    println!("{}: [{}] {}", path.display(), s.secret_type, s.data);
                }
            }
        }
        Ok(ScanResult::CrawlResult(_)) => unreachable!(),
        Err(SecretScraperError::Scanner(msg)) => eprintln!("scan failed: {msg}"),
        Err(e) => eprintln!("error: {e}"),
    }
    Ok(())
}

§Advanced: Crawl with Full Options

use std::time::Duration;
use secret_scraper::{
    cli::{Config, Mode, Rule},
    facade::{CrawlerFacade, ScanFacade, ScanResult},
};

let mut config = Config::default_with_rules();
config.url = Some("https://example.com".to_string());
config.mode = Mode::Thorough;
config.max_depth = Some(3);
config.max_page = Some(500);
config.max_concurrency_per_domain = 10;
config.min_request_interval = Duration::from_millis(500);
config.timeout = Duration::from_secs(15);
config.follow_redirect = true;
config.validate = true;
config.detail = true;
config.user_agent = Some("SecretScraper/0.1".into());
config.proxy = Some("http://127.0.0.1:8080".into());
config.allow_domains = Some(vec!["*.example.com".into()]);
config.dangerous_paths = Some(vec!["logout".into(), "delete".into()]);
config.outfile = Some("crawl.csv".into());
config.custom_rules.push(
    Rule::new("JWT".into(), r"eyJ[a-zA-Z0-9_-]+\.[a-zA-Z0-9_-]+\.[a-zA-Z0-9_-]+").unwrap()
);

match Box::new(CrawlerFacade::new(config).unwrap()).scan().unwrap() {
    ScanResult::CrawlResult(result) => {
        println!(
            "Done: {} domains, {} URLs, {} JS files, {} secrets",
            result.hosts.len(),
            result.urls.len(),
            result.js.len(),
            result.secrets.len(),
        );
    }
    ScanResult::LocalScanResult(_) => unreachable!(),
}

§Module Overview

ModulePurpose
cliConfiguration types: Config, Mode, Rule.
facadeHigh-level entry points: CrawlerFacade, FileScannerFacade.
errorError types: SecretScraperError and the Result alias.
handlerSecret detection: RegexHandler, Secret.
urlparserURL representation: URLNode, ResponseStatus.
filterDomain allow-list / block-list filter chain.
outputHuman-readable and CSV output formatting.
rate_limiterPer-domain request rate limiting.
scannerLocal file traversal and scanning engine.
scraperLower-level crawler actor implementation.
loggingTracing and log subscriber initialization.

Modules§

cli
Command-line and YAML configuration types. CLI, YAML, and runtime configuration types.
error
Library error and result types. Error types used by the public SecretScraper facade API.
facade
High-level crawler and file-scanner facades. High-level scan facades for crawler and local file scanning workflows.
filter
URL filtering primitives. URL filtering primitives used by the crawler.
handler
Secret extraction handlers. Secret detection handlers and result types.
logging
Tracing/logging initialization helpers. Logging and tracing setup.
output
Human-readable and CSV output formatting. Formatting and CSV output helpers.
rate_limiter
Per-domain crawler rate limiting. Per-domain crawler rate limiting.
scanner
Local file scanning engine. Local file scanning engine.
scraper
Lower-level crawler actor implementation. Actor-based crawler internals.
urlparser
URL node and link extraction utilities. URL node representation and link extraction helpers.