domain_status
domain_status is a concurrent tool for checking URL statuses and redirections. Built with async/await (Tokio), it processes URLs in parallel while capturing comprehensive metadata including TLS certificates, HTML content, DNS information, technology fingerprints, and redirect chains. Results are stored in a SQLite database for analysis.
Table of Contents
- Quick Start
- Installation
- Features
- Use Cases
- Usage
- Output & Results
- Database Schema
- Monitoring
- Advanced Topics
- Troubleshooting
- Library Usage
- Technical Details
- Architecture
- Security
- Development
- License
🚀 Quick Start
1. Create a file with URLs (one per line):
2. Run the scan:
3. Query results:
Example output:
domain | status | title
------------------|--------|--------------------------
example.com | 200 | Example Domain
rust-lang.org | 200 | Rust Programming Language
old-website.gov | 301 | (redirected)
The tool processes URLs concurrently (30 by default), stores all data in SQLite, and provides progress updates. See Installation below to get started.
📦 Installation
Option 1: Install via Cargo (Recommended for Rust users)
Requires Rust 1.85 or newer:
This compiles from source and installs the binary to ~/.cargo/bin/domain_status (or %USERPROFILE%\.cargo\bin\domain_status.exe on Windows). The binary is added to your PATH automatically.
Benefits:
- ✅ No macOS Gatekeeper warnings (compiled locally)
- ✅ Works on any platform Rust supports
- ✅ Easy updates:
cargo install --force domain_status - ✅ No manual downloads or extraction needed
Note: This crate requires Rust 1.85 or newer (for edition 2024 support in dependencies). If installation fails, update your Rust toolchain: rustup update stable.
Option 2: Download Pre-built Binary
Download the latest release from the Releases page:
# Linux (x86_64)
# macOS (Intel)
# macOS (Apple Silicon)
# macOS: Handle Gatekeeper warning (unsigned binary)
# Option 1: Right-click the binary, select "Open", then click "Open" in the dialog
# Option 2: Run this command to remove the quarantine attribute:
||
# Windows
# Download domain_status-windows-x86_64.exe.zip and extract
Option 3: Build from Source
Requires Rust 1.85 or newer:
# Clone the repository
# Build release binary
This creates an executable in ./target/release/domain_status (or domain_status.exe on Windows).
Note: SQLite is bundled in the binary - no system SQLite installation required. The tool is completely self-contained.
🌟 Features
Data Collection
- Comprehensive URL Analysis: Captures HTTP status, response times, HTML metadata, TLS certificates, DNS information, technology fingerprints, GeoIP location data, WHOIS registration data, structured data (JSON-LD, Open Graph, Twitter Cards), security warnings, and complete redirect chains
- Technology Fingerprinting: Detects web technologies using community-maintained Wappalyzer rulesets via pattern matching (headers, cookies, HTML, script URLs). Does not execute JavaScript.
- Enhanced DNS Analysis: Queries NS, TXT, and MX records; automatically extracts SPF and DMARC policies
- Enhanced TLS Analysis: Captures cipher suite and key algorithm in addition to certificate details
- GeoIP Lookup: Automatic geographic and network information lookup using MaxMind GeoLite2 databases (auto-downloads if license key provided)
Performance
- Concurrent Processing: Async/await with configurable concurrency limits (default: 30 concurrent requests, 15 RPS)
- Adaptive Rate Limiting: Token-bucket rate limiting with automatic adjustment based on error rates (always enabled)
- Resource Efficiency: Shared HTTP clients, DNS resolver, and HTML parser instances across concurrent tasks
Reliability
- Intelligent Error Handling: Automatic retries with exponential backoff, error rate monitoring with dynamic throttling, and comprehensive processing statistics
- Robust Data Storage: SQLite database with WAL mode, UPSERT semantics, and unique constraints for idempotent processing
- Timeout Protection: Per-URL processing timeout (35 seconds) prevents hung requests
Integration
- Flexible Configuration: Extensive CLI options for logging, timeouts, concurrency, rate limits, database paths, and fingerprint rulesets
- Library API: Use as a Rust library in your own projects
- Status Server: Optional HTTP server for monitoring long-running jobs with Prometheus metrics
- Security Features: URL validation (http/https only), content-type filtering, response size limits, and redirect hop limits
💼 Use Cases
Domain Portfolio Management: Check status of multiple domains, track redirects, verify SSL certificates, monitor domain expiration dates (with WHOIS enabled).
Security Audits: Identify missing security headers (CSP, HSTS, etc.), detect expired certificates, inventory technology stacks to identify potential vulnerabilities.
Competitive Analysis: Track technology stacks across competitors, identify analytics tools and tracking IDs, gather structured data (Open Graph, JSON-LD) for comparison.
Monitoring: Integrate with Prometheus for ongoing status checks via the status server endpoint, track changes over time by querying run history.
Research: Bulk analysis of web technologies, DNS configurations, geographic distribution of infrastructure, technology adoption patterns.
Unlike single-purpose tools (curl, nmap, whois), domain_status consolidates many checks in one sweep, ensuring consistency and saving time.
📖 Usage
Command-Line Options
Common Options:
--log-level <LEVEL>: Log level:error,warn,info,debug, ortrace(default:info)--log-format <FORMAT>: Log format:plainorjson(default:plain)--db-path <PATH>: SQLite database file path (default:./url_checker.db)--max-concurrency <N>: Maximum concurrent requests (default: 30)--timeout-seconds <N>: HTTP client timeout in seconds (default: 10). Note: Per-URL processing timeout is 35 seconds.--rate-limit-rps <N>: Initial requests per second (adaptive rate limiting always enabled, default: 15)--show-timing: Display detailed timing metrics at the end of the run (default: disabled)--status-port <PORT>: Start HTTP status server on the specified port (optional, disabled by default)
Advanced Options:
--user-agent <STRING>: HTTP User-Agent header value (default: Chrome user agent)--fingerprints <URL|PATH>: Technology fingerprint ruleset source (URL or local path). Default: HTTP Archive Wappalyzer fork. Rules are cached locally for 7 days.--geoip <PATH|URL>: GeoIP database path (MaxMind GeoLite2 .mmdb file) or download URL. If not provided, will auto-download ifMAXMIND_LICENSE_KEYenvironment variable is set.--enable-whois: Enable WHOIS/RDAP lookup for domain registration information. WHOIS data is cached for 7 days. Default: disabled.
Example:
Environment Variables
MAXMIND_LICENSE_KEY: MaxMind license key for automatic GeoIP database downloads. Get a free key from MaxMind. If not set, GeoIP lookup is disabled and the application continues normally.URL_CHECKER_DB_PATH: Override default database path (alternative to--db-path)
URL Input
- URLs can be provided with or without
http://orhttps://prefix - If no scheme is provided,
https://is automatically prepended - Only
http://andhttps://URLs are accepted; other schemes are rejected - Invalid URLs are skipped with a warning
📊 Output & Results
The tool provides detailed logging with progress updates and error summaries:
Plain format (default):
✔️ domain_status [INFO] Processed 88 lines in 128.61 seconds (~0.68 lines/sec)
✔️ domain_status [INFO] Run statistics: total=100, successful=88, failed=12
✔️ domain_status [INFO] Error Counts (21 total):
✔️ domain_status [INFO] Bot detection (403 Forbidden): 4
✔️ domain_status [INFO] Process URL timeout: 3
✔️ domain_status [INFO] DNS NS lookup error: 2
...
JSON format (--log-format json):
Performance Analysis (--show-timing):
Use the --show-timing flag to display detailed timing metrics:
Example output:
=== Timing Metrics Summary (88 URLs) ===
Average times per URL:
HTTP Request: 1287 ms (40.9%)
DNS Forward: 845 ms (26.8%)
TLS Handshake: 1035 ms (32.9%)
HTML Parsing: 36 ms (1.1%)
Tech Detection: 1788 ms (56.8%)
Total: 3148 ms
Note: Performance varies significantly based on rate limiting, network conditions, target server behavior, and error handling. Expect 0.5-2 lines/sec with default settings. Higher rates may trigger bot detection.
Querying Results
All results are stored in the SQLite database. You can query the database while the scan is running (WAL mode allows concurrent reads). Here are some useful queries:
Basic status overview:
SELECT domain, status, status_description, response_time
FROM url_status
ORDER BY domain;
Find all failed URLs:
SELECT domain, status, status_description
FROM url_status
WHERE status >= 400 OR status = 0
ORDER BY status;
Find all sites using a specific technology:
SELECT DISTINCT us.domain, us.status
FROM url_status us
JOIN url_technologies ut ON us.id = ut.url_status_id
WHERE ut.technology_name = 'WordPress'
ORDER BY us.domain;
Find sites with missing security headers:
SELECT DISTINCT us.domain
FROM url_status us
JOIN url_security_warnings usw ON us.id = usw.url_status_id
WHERE usw.warning_type LIKE '%missing%'
ORDER BY us.domain;
Find all redirects:
SELECT domain, final_domain, status, redirect_count
FROM url_status
WHERE redirect_count > 0
ORDER BY redirect_count DESC;
Compare runs by version:
SELECT version, COUNT(*) as runs,
SUM(total_urls) as total_urls,
AVG(elapsed_seconds) as avg_time
FROM runs
WHERE end_time IS NOT NULL
GROUP BY version
ORDER BY version DESC;
Get all URLs from a specific run:
SELECT domain, status, title, response_time
FROM url_status
WHERE run_id = 'run_1765150444953'
ORDER BY domain;
📊 Database Schema
The database uses a star schema design pattern with:
- Fact Table:
url_status(main URL data) - Dimension Table:
runs(run-level metadata including version) - Junction Tables: Multi-valued fields (technologies, headers, DNS records, etc.)
- One-to-One Tables:
url_geoip,url_whois - Failure Tracking:
url_failureswith related tables for error context (redirect chains, request/response headers)
Key Features:
- WAL mode for concurrent reads/writes
- UPSERT semantics:
UNIQUE (final_domain, timestamp)ensures idempotency - Comprehensive indexes for fast queries
- Normalized structure for efficient storage and analytics
For complete database schema documentation including entity-relationship diagrams, table descriptions, indexes, constraints, and query examples, see DATABASE.md.
Querying Run History
All scan results are persisted in the database, so you can query past runs even after closing the terminal. The runs table stores summary statistics for each scan:
-- View all completed runs (most recent first)
SELECT
run_id,
version,
datetime(start_time/1000, 'unixepoch') as start_time,
datetime(end_time/1000, 'unixepoch') as end_time,
elapsed_seconds,
total_urls,
successful_urls,
failed_urls,
ROUND(100.0 * successful_urls / total_urls, 1) as success_rate
FROM runs
WHERE end_time IS NOT NULL
ORDER BY start_time DESC
LIMIT 10;
Example output:
run_id | version | start_time | end_time | elapsed_seconds | total_urls | successful_urls | failed_urls | success_rate
--------------------|---------|---------------------|---------------------|-----------------|------------|-----------------|------------|--------------
run_1765150444953 | 0.1.4 | 2025-01-07 23:33:59 | 2025-01-07 23:34:52 | 52.1 | 100 | 89 | 11 | 89.0
Using the library API:
use query_run_history;
use SqlitePool;
let pool = connect.await?;
let runs = query_run_history.await?;
for run in runs
📈 Monitoring
For long-running jobs, you can monitor progress via an optional HTTP status server:
# Start with status server on port 8080
# In another terminal, check progress:
|
# Or view Prometheus metrics:
The status server provides:
- Real-time progress: Total URLs, completed, failed, percentage complete, processing rate
- Error breakdown: Detailed counts by error type
- Warning/info metrics: Track missing metadata, redirects, bot detection events
- Prometheus compatibility: Metrics endpoint ready for Prometheus scraping
Status Endpoint (/status)
Returns detailed JSON status with real-time progress information:
|
Response Format:
Metrics Endpoint (/metrics)
Returns Prometheus-compatible metrics in text format:
Metrics:
domain_status_total_urls(gauge): Total URLs to processdomain_status_completed_urls(gauge): Successfully processed URLsdomain_status_failed_urls(gauge): Failed URLsdomain_status_percentage_complete(gauge): Completion percentage (0-100)domain_status_rate_per_second(gauge): Processing rate (URLs/sec)domain_status_errors_total(counter): Total error countdomain_status_warnings_total(counter): Total warning countdomain_status_info_total(counter): Total info event count
Prometheus Integration:
scrape_configs:
- job_name: 'domain_status'
static_configs:
- targets:
🔧 Advanced Topics
GeoIP Setup
To enable GeoIP, set the MAXMIND_LICENSE_KEY environment variable and the tool will automatically download the MaxMind GeoLite2 databases on first run:
The databases are cached in .geoip_cache/ and reused for subsequent runs. Alternatively, download the .mmdb files yourself and use --geoip to point to them. GeoIP data is stored in the url_geoip table with fields for country, region, city, coordinates, and ASN.
If GeoIP fails or no key is provided, the tool safely skips GeoIP lookup with a warning and continues normally.
WHOIS/RDAP Details
The --enable-whois flag performs WHOIS/RDAP queries to fetch domain registration information. This significantly slows down processing (adds approximately 1 second per domain) due to rate limits imposed by registrars.
Rate Limiting: WHOIS queries are rate-limited to 0.5 queries/second (1 query per 2 seconds) to respect registrar limits. This is separate from HTTP rate limiting.
Caching: WHOIS data is cached in .whois_cache/ by domain name for 7 days to avoid redundant queries.
Limitations: Not all TLDs provide public WHOIS via port 43, and some registrars limit the data returned. RDAP fallback helps but is not universal. If a WHOIS server blocks you, you may see warnings in the logs.
Enable this flag only when you need registrar/expiration information. For faster scans, leave it disabled (default).
Technology Fingerprinting
Technology detection uses pattern matching against:
- HTTP headers (Server, X-Powered-By, etc.)
- Cookies
- Meta tags (name, property, http-equiv)
- Script source URLs (from HTML, not fetched)
- HTML text content
- URL patterns
- Script tag IDs (e.g.,
__NEXT_DATA__for Next.js)
Important: The tool does NOT execute JavaScript or fetch external scripts. It only analyzes the initial HTML response, matching WappalyzerGo's behavior.
The default fingerprint ruleset comes from the HTTP Archive Wappalyzer fork and is cached locally for 7 days. You can update to the latest by pointing --fingerprints to a new JSON file (e.g., the official Wappalyzer technologies.json). The tool prints the fingerprints source and version (commit hash) in the runs table.
If you maintain your own fingerprint file (e.g., for internal technologies), you can use that too.
Performance Tuning
Concurrency: The default is 30 concurrent requests. If you have good bandwidth and target sites can handle it, you can increase --max-concurrency. Monitor the /metrics endpoint's rate to see actual throughput. Conversely, if you encounter many timeouts or want to be gentle on servers, lower concurrency.
Rate Limiting: The default is 15 RPS with adaptive adjustment. The adaptive rate limiter:
- Starts at initial RPS (default: 15)
- Monitors 429 errors and timeouts in a sliding window
- Automatically reduces RPS by 50% when error rate exceeds threshold (default: 20%)
- Gradually increases RPS by 15% when error rate is below threshold
- Maximum RPS capped at 2x initial value
Memory: Each concurrent task consumes memory for HTML and data. With default settings, memory usage is moderate. If scanning extremely large pages, consider that response bodies are capped at 2MB and HTML text extraction is limited to 50KB.
Error Handling & Retries
The tool automatically retries failed HTTP requests up to 2 additional times (3 total attempts) with exponential backoff (initial: 500ms, max: 15s). If a domain consistently fails (e.g., DNS not resolved, or all attempts timed out), it will be marked in the url_failures table with details. The errors section of the status output counts these. You don't need to re-run for transient errors; they are retried on the fly.
❓ Troubleshooting
Scan is very slow or stuck:
- Check if you hit a rate limit. domain_status automatically slows down on high error rate (adaptive rate limiting).
- Enabling WHOIS adds approximately 1 second per domain due to rate limits.
- If it's truly stuck, use
RUST_LOG=debugenvironment variable (or--log-level debug) to see what it's doing.
I see 'bot_detection_403' in info metrics:
- Some sites actively block non-browser agents. Try using
--user-agentto mimic a different browser or reduce rate with--rate-limit-rps.
Database is locked error:
- If you open the DB in another tool while scanning, thanks to WAL mode reads should be fine.
- However, writing from two domain_status processes to the same DB is not supported (each run uses its own DB by default).
WHOIS data seems incomplete for some TLDs:
- Not all registrars provide the same info; domain_status tries its best.
- Some ccTLDs don't have public WHOIS via port 43.
- RDAP fallback helps but is not universal.
GeoIP shows "unknown" or is empty:
- Probably the MaxMind license key wasn't set or the download failed.
- Ensure internet access for the first run or provide the
.mmdbfiles manually with--geoip. - Check that the license key is valid and has GeoLite2 access enabled.
Compilation fails (for users building from source):
- Make sure Rust is updated to latest stable (1.85 or newer required).
- If error relates to a crate, run
cargo update. - On Windows, ensure OpenSSL or Schannel is available if reqwest needs it (rare, as we use rustls by default).
Technology detection seems wrong:
- Pattern matching can have false positives. Wappalyzer rules are community-maintained.
- If you get an obvious false positive, you can verify by visiting the site manually.
- You can update or customize the fingerprints to tweak this (see Technology Fingerprinting above).
How do I update the tool?
- If using
cargo install, runcargo install --force domain_statusto get the latest version. - Check the CHANGELOG for any breaking changes in flags or output.
📚 Library Usage
You can also use domain_status as a Rust library in your own projects. Add it to your Cargo.toml:
[]
= "0.1.4"
= { = "1", = ["full"] }
Then use it in your code:
use ;
use PathBuf;
async
See the API documentation for details on Config options and usage.
Note: The library requires a Tokio runtime. Use #[tokio::main] in your application or ensure you're calling library functions within an async context.
🛠️ Technical Details
Dependencies:
- HTTP Client:
reqwestwithrustlsTLS backend - DNS Resolution:
hickory-resolver(async DNS with system config fallback) - Domain Extraction:
pslcrate for accurate domain parsing (handles multi-part TLDs correctly) - HTML Parsing:
scraper(CSS selector-based extraction) - TLS/Certificates:
tokio-rustlsandx509-parserfor certificate analysis - Technology Detection: Custom implementation using Wappalyzer rulesets with pattern matching (headers, cookies, HTML, script URLs). Does not execute JavaScript.
- WHOIS/RDAP:
whois-servicecrate for domain registration lookups - GeoIP:
maxminddbfor geographic and network information - Database:
sqlxwith SQLite (WAL mode enabled) - Async Runtime: Tokio
Security:
- Security Audit:
cargo-auditruns in CI to detect known vulnerabilities in dependencies (uses RustSec advisory database) - Secret Scanning:
gitleaksscans commits and code for accidentally committed secrets, API keys, tokens, and credentials - Code Quality: Clippy with
-D warningsenforces strict linting rules and catches security issues
🏗️ Architecture
domain_status follows a pipeline architecture:
Input File → URL Validation → Concurrent Processing → Data Extraction → Direct Database Writes → SQLite Database
Core Components:
- Main Orchestrator: Reads URLs, validates, normalizes, manages concurrency
- HTTP Request Handler: Fetches URLs, follows redirects, extracts response data
- Data Extraction: Parses HTML, detects technologies, queries DNS/TLS/GeoIP/WHOIS (parallelized where possible)
- Database Writer: Direct writes to SQLite (WAL mode handles concurrency efficiently)
- Error Handling: Categorizes errors, implements retries with exponential backoff
- Rate Limiting: Token-bucket algorithm with adaptive adjustment
Concurrency Model:
- Async runtime: Tokio
- Concurrency control: Semaphore limits concurrent tasks
- Rate limiting: Token-bucket with adaptive adjustment
- Background tasks: Status server (optional), adaptive rate limiter
- Graceful shutdown: All background tasks cancellable via
CancellationToken - Parallel execution: Technology detection and DNS/TLS fetching run in parallel (independent operations)
Performance Characteristics:
- Non-blocking I/O for all network operations
- Shared resources (HTTP client, DNS resolver, database pool) across tasks
- Bounded concurrency prevents resource exhaustion
- Direct database writes with SQLite WAL mode (efficient concurrent writes)
- Memory efficiency: Response bodies limited to 2MB, HTML text extraction limited to 50KB
🔒 Security & Secret Management
Preventing Credential Leaks:
-
Pre-commit hooks (recommended): Install pre-commit hooks to catch secrets before they're committed:
# Install pre-commit (if not already installed) # or: pip install pre-commit # Install hooksThis will automatically scan for secrets before every commit.
-
CI scanning: Gitleaks runs in CI to catch secrets in pull requests and scan git history.
-
GitHub Secret Scanning: GitHub automatically scans public repositories for known secret patterns (enabled by default).
-
Best practices:
- Never commit
.envfiles (already in.gitignore) - Use environment variables for all secrets
- Use GitHub Secrets for CI/CD tokens
- Review gitleaks output if CI fails
- Never commit
🔨 Development
See AI_AGENTS.md for development guidelines and conventions.
License
MIT License - see LICENSE file for details.