Urx is a command-line tool designed for collecting URLs from OSINT archives, such as the Wayback Machine and Common Crawl. Built with Rust for efficiency, it leverages asynchronous processing to rapidly query multiple data sources. This tool simplifies the process of gathering URL information for a specified domain, providing a comprehensive dataset that can be used for various purposes, including security testing and analysis.
Features
- Fetch URLs from multiple sources in parallel (Wayback Machine, Common Crawl, OTX)
- Filter results by file extensions, patterns, or predefined presets (e.g., "no-image" to exclude images)
- Support for multiple output formats: plain text, JSON, CSV
- Output results to the console or a file, or stream via stdin for pipeline integration
- URL Testing:
- Filter and validate URLs based on HTTP status codes and patterns.
- Extract additional links from collected URLs

Installation
From Cargo
From Homebrew (Tap)
From Source
The compiled binary will be available at target/release/urx.
From Docker
Usage
Basic Usage
# Scan a single domain
# Scan multiple domains
# Scan domains from a file
|
Options
Usage: urx [OPTIONS] [DOMAINS]...
Arguments:
[DOMAINS]... Domains to fetch URLs for
Options:
-c, --config <CONFIG> Config file to load
-h, --help Print help
-V, --version Print version
Output Options:
-o, --output <OUTPUT> Output file to write results
-f, --format <FORMAT> Output format (e.g., "plain", "json", "csv") [default: plain]
--merge-endpoint Merge endpoints with the same path and merge URL parameters
Provider Options:
--providers <PROVIDERS> Providers to use (comma-separated, e.g., "wayback,cc,otx,vt,urlscan") [default: wayback,cc,otx]
--subs Include subdomains when searching
--cc-index <CC_INDEX> Common Crawl index to use (e.g., CC-MAIN-2025-13) [default: CC-MAIN-2025-13]
--vt-api-key <VT_API_KEY> API key for VirusTotal (can also use URX_VT_API_KEY environment variable)
--urlscan-api-key <URLSCAN_API_KEY> API key for Urlscan (can also use URX_URLSCAN_API_KEY environment variable)
Discovery Options:
--exclude-robots Exclude robots.txt discovery
--exclude-sitemap Exclude sitemap.xml discovery
Display Options:
-v, --verbose Show verbose output
--silent Silent mode (no output)
--no-progress No progress bar
Filter Options:
-p, --preset <PRESET>
Filter Presets (e.g., "no-resources,no-images,only-js,only-style")
-e, --extensions <EXTENSIONS>
Filter URLs to only include those with specific extensions (comma-separated, e.g., "js,php,aspx")
--exclude-extensions <EXCLUDE_EXTENSIONS>
Filter URLs to exclude those with specific extensions (comma-separated, e.g., "html,txt")
--patterns <PATTERNS>
Filter URLs to only include those containing specific patterns (comma-separated)
--exclude-patterns <EXCLUDE_PATTERNS>
Filter URLs to exclude those containing specific patterns (comma-separated)
--show-only-host
Only show the host part of the URLs
--show-only-path
Only show the path part of the URLs
--show-only-param
Only show the parameters part of the URLs
--min-length <MIN_LENGTH>
Minimum URL length to include
--max-length <MAX_LENGTH>
Maximum URL length to include
--strict
Enforce exact host validation (default)
Network Options:
--network-scope <NETWORK_SCOPE> Control which components network settings apply to (all, providers, testers, or providers,testers) [default: all]
--proxy <PROXY> Use proxy for HTTP requests (format: http://proxy.example.com:8080)
--proxy-auth <PROXY_AUTH> Proxy authentication credentials (format: username:password)
--insecure Skip SSL certificate verification (accept self-signed certs)
--random-agent Use a random User-Agent for HTTP requests
--timeout <TIMEOUT> Request timeout in seconds [default: 30]
--retries <RETRIES> Number of retries for failed requests [default: 3]
--parallel <PARALLEL> Maximum number of parallel requests per provider and maximum concurrent domain processing [default: 5]
--rate-limit <RATE_LIMIT> Rate limit (requests per second)
Testing Options:
--check-status Check HTTP status code of collected URLs [aliases: --cs]
--include-status <INCLUDE_STATUS> Include URLs with specific HTTP status codes or patterns (e.g., --is=200,30x) [aliases: --is]
--exclude-status <EXCLUDE_STATUS> Exclude URLs with specific HTTP status codes or patterns (e.g., --es=404,50x,5xx) [aliases: --es]
--extract-links Extract additional links from collected URLs (requires HTTP requests)
Examples
# Save results to a file
# Output in JSON format
# Filter for JavaScript files only
# Exclude HTML and text files
# Filter for API endpoints
# Exclude specific patterns
# Use Fileter Preset (similar to --exclude-extensions=png,jpg,.....)
# Use specific providers
# Using VirusTotal and URLScan providers
# 1. Explicitly add to providers (with API keys via command line)
# 2. Using environment variables for API keys
URX_VT_API_KEY=*** URX_URLSCAN_API_KEY=***
# 3. Auto-enabling: providers are automatically added when API keys are provided
# URLs from robots.txt and sitemap.xml are included by default
# Exclude URLs from robots.txt files
# Exclude URLs from sitemap
# Include subdomains
# Check status of collected URLs
# Extract additional links from collected URLs
# Network configuration
# Advanced filtering
# HTTP Status code based filtering
# Disable host validation
Integration with Other Tools
Urx works well in pipelines with other security and reconnaissance tools:
# Find domains, then discover URLs
| |
# Combine with other tools
| |
Inspiration
Urx was inspired by gau (GetAllUrls), a tool that fetches known URLs from AlienVault's Open Threat Exchange, the Wayback Machine, and Common Crawl. While sharing similar core functionality, Urx was built from the ground up in Rust with a focus on performance, concurrency, and expanded filtering capabilities.
Contribute
Urx is open-source project and made it with ❤️ if you want contribute this project, please see CONTRIBUTING.md and Pull-Request with cool your contents.