Expand description
§SecretScraper
Rust library for crawling web targets, discovering URLs and JavaScript links, and detecting secrets (API keys, credentials, internal IPs, PII, and more) with configurable regular-expression rules. Also scans local files and directories recursively.
§Quick Start
Crawl a website with the built-in detection rules:
use secret_scraper::{
cli::{Config, Mode},
facade::{CrawlerFacade, ScanFacade, ScanResult},
};
let mut config = Config::default_with_rules();
config.url = Some("https://example.com".to_string());
config.mode = Mode::Thorough;
config.detail = true;
config.outfile = Some("crawl.csv".into());
match Box::new(CrawlerFacade::new(config).unwrap()).scan().unwrap() {
ScanResult::CrawlResult(result) => {
println!(
"{} domains, {} URL groups, {} secret-bearing URLs",
result.hosts.len(),
result.urls.len(),
result.secrets.len()
);
}
ScanResult::LocalScanResult(_) => unreachable!(),
}Scan a local directory recursively:
use secret_scraper::{
cli::Config,
facade::{FileScannerFacade, ScanFacade, ScanResult},
};
let mut config = Config::default_with_rules();
config.local = Some("./samples".into());
config.outfile = Some("local-scan.yml".into());
match Box::new(FileScannerFacade::new(config).unwrap()).scan().unwrap() {
ScanResult::LocalScanResult(result) => {
println!("{} files scanned", result.len());
for (path, secrets) in &result {
println!("{}: {} secrets", path.display(), secrets.len());
}
}
ScanResult::CrawlResult(_) => unreachable!(),
}§Features
- Web crawling — crawl seed URLs with configurable depth, following HTML links, JavaScript sources, and regex-discovered URLs.
- Local file scanning — scan a single file or walk a directory tree recursively for secrets.
- Built-in secret rules — detects Swagger docs, ID cards, phone numbers, email addresses, internal IPs, cloud keys, Shiro keys, API keys, and more.
- Custom rules — add your own regex patterns for URL discovery, JavaScript link extraction, and secret detection.
- Domain filtering — allow-list or block-list domains with wildcard
patterns (
*.example.com). - Rate limiting — per-domain concurrency caps and minimum request intervals.
- Proxy support — HTTP and SOCKS5 proxies.
- Status filtering — filter displayed results by HTTP status codes or ranges.
- Validation mode — verify discovered link statuses without crawling them.
- Output formats — crawl results as CSV, local scan results as YAML.
§Configuration
Build a Config by starting from a default, then setting
fields directly. The layering order used by the CLI (defaults → YAML → CLI
flags) is available programmatically via
apply_file_layer and
apply_cli_layer, but for library usage you
typically set fields directly on the struct.
Two constructors are available:
| Method | Description |
|---|---|
Config::default() | Empty rule lists — add your own rules. |
Config::default_with_rules() | Pre-populated with 5 URL-find, 3 JS-find, and 10 secret-detection rules. |
Key configuration fields on Config:
| Field | Type | Description |
|---|---|---|
url | Option<String> | Single seed URL for crawling. |
url_file | Option<PathBuf> | Newline-delimited file of seed URLs. |
local | Option<PathBuf> | File or directory for local scanning. |
mode | Mode | Normal (depth 1) or Thorough (depth 2). |
max_depth | Option<u32> | Override crawl depth; 0 = seed URLs only. |
max_page | Option<u32> | Maximum pages to crawl (default 1000). |
detail | bool | Show per-URL hierarchy in output. |
validate | bool | Validate discovered link statuses. |
follow_redirect | bool | Follow HTTP redirects. |
hide_regex | bool | Suppress secret output. |
outfile | Option<PathBuf> | Write results to file (CSV for crawl, YAML for scan). |
timeout | Duration | Request timeout (default 30s). |
proxy | Option<String> | Proxy URL (http://host:port or socks5://host:port). |
user_agent | Option<String> | Override User-Agent header. |
cookie | Option<String> | Set Cookie header. |
allow_domains | Option<Vec<String>> | Domain allow-list with wildcards. |
disallow_domains | Option<Vec<String>> | Domain block-list with wildcards. |
max_concurrency_per_domain | usize | Concurrent request cap per domain (default 50). |
min_request_interval | Duration | Minimum seconds between requests to same domain (default 200ms). |
dangerous_paths | Option<Vec<String>> | Path fragments to avoid requesting (e.g. logout, delete). |
url_find_rules | Vec<Rule> | Regex rules for discovering URLs in response text. |
js_find_rules | Vec<Rule> | Regex rules for discovering JavaScript URLs. |
custom_rules | Vec<Rule> | Regex rules for secret detection. |
custom_headers | Option<HeaderMap> | Extra HTTP headers sent with requests. |
status_filter | Option<StatusRangeRule> | Filter output by response status. |
§Custom Rules
Use Rule::new to compile a named regex:
use secret_scraper::cli::{Config, Rule};
let mut config = Config::default();
config.url_find_rules.push(
Rule::new_with_group("api_path".into(), r#""(/api/v[0-9]+/[^"]+)""#, true).unwrap()
);
config.custom_rules.push(
Rule::new("Custom Token".into(), r"TOKEN_[A-Z0-9]{16}").unwrap()
);Rule::new emits the full regex match. Use
Rule::new_with_group when capture groups
should be emitted instead, which is usually what URL-discovery rules need.
Rules added via Config::default() start empty. When using
Config::default_with_rules(), your custom rules are appended to the
built-in lists.
§Result Handling
The high-level API uses ScanFacade::scan, which
returns ScanStdResult — an alias for
Result<ScanResult, SecretScraperError>.
use secret_scraper::{
cli::Config,
error::{Result as SsResult, SecretScraperError},
facade::{FileScannerFacade, ScanFacade, ScanResult},
};
fn try_scan() -> SsResult<()> {
let mut config = Config::default_with_rules();
config.local = Some("./src".into());
match Box::new(FileScannerFacade::new(config)?).scan() {
Ok(ScanResult::LocalScanResult(files)) => {
for (path, secrets) in &files {
for s in secrets {
println!("{}: [{}] {}", path.display(), s.secret_type, s.data);
}
}
}
Ok(ScanResult::CrawlResult(_)) => unreachable!(),
Err(SecretScraperError::Scanner(msg)) => eprintln!("scan failed: {msg}"),
Err(e) => eprintln!("error: {e}"),
}
Ok(())
}§Advanced: Crawl with Full Options
use std::time::Duration;
use secret_scraper::{
cli::{Config, Mode, Rule},
facade::{CrawlerFacade, ScanFacade, ScanResult},
};
let mut config = Config::default_with_rules();
config.url = Some("https://example.com".to_string());
config.mode = Mode::Thorough;
config.max_depth = Some(3);
config.max_page = Some(500);
config.max_concurrency_per_domain = 10;
config.min_request_interval = Duration::from_millis(500);
config.timeout = Duration::from_secs(15);
config.follow_redirect = true;
config.validate = true;
config.detail = true;
config.user_agent = Some("SecretScraper/0.1".into());
config.proxy = Some("http://127.0.0.1:8080".into());
config.allow_domains = Some(vec!["*.example.com".into()]);
config.dangerous_paths = Some(vec!["logout".into(), "delete".into()]);
config.outfile = Some("crawl.csv".into());
config.custom_rules.push(
Rule::new("JWT".into(), r"eyJ[a-zA-Z0-9_-]+\.[a-zA-Z0-9_-]+\.[a-zA-Z0-9_-]+").unwrap()
);
match Box::new(CrawlerFacade::new(config).unwrap()).scan().unwrap() {
ScanResult::CrawlResult(result) => {
println!(
"Done: {} domains, {} URLs, {} JS files, {} secrets",
result.hosts.len(),
result.urls.len(),
result.js.len(),
result.secrets.len(),
);
}
ScanResult::LocalScanResult(_) => unreachable!(),
}§Module Overview
| Module | Purpose |
|---|---|
cli | Configuration types: Config, Mode, Rule. |
facade | High-level entry points: CrawlerFacade, FileScannerFacade. |
error | Error types: SecretScraperError and the Result alias. |
handler | Secret detection: RegexHandler, Secret. |
urlparser | URL representation: URLNode, ResponseStatus. |
filter | Domain allow-list / block-list filter chain. |
output | Human-readable and CSV output formatting. |
rate_limiter | Per-domain request rate limiting. |
scanner | Local file traversal and scanning engine. |
scraper | Lower-level crawler actor implementation. |
logging | Tracing and log subscriber initialization. |
Modules§
- cli
- Command-line and YAML configuration types. CLI, YAML, and runtime configuration types.
- error
- Library error and result types. Error types used by the public SecretScraper facade API.
- facade
- High-level crawler and file-scanner facades. High-level scan facades for crawler and local file scanning workflows.
- filter
- URL filtering primitives. URL filtering primitives used by the crawler.
- handler
- Secret extraction handlers. Secret detection handlers and result types.
- logging
- Tracing/logging initialization helpers. Logging and tracing setup.
- output
- Human-readable and CSV output formatting. Formatting and CSV output helpers.
- rate_
limiter - Per-domain crawler rate limiting. Per-domain crawler rate limiting.
- scanner
- Local file scanning engine. Local file scanning engine.
- scraper
- Lower-level crawler actor implementation. Actor-based crawler internals.
- urlparser
- URL node and link extraction utilities. URL node representation and link extraction helpers.