rexturl
A command-line tool for parsing and manipulating URLs with predictable output formats.
Key Features
Clean UX Design
- One flag controls format:
--format {plain,tsv,csv,json,jsonl,custom,sql} - Precise field selection:
--fields domain,path,url - Custom templates:
--template '{scheme}://{domain}{path}' - SQL generation: Multi-dialect INSERT statements with proper escaping
- Consistent output: Same field order across all formats
- Machine-friendly: Proper headers, null handling, exit codes
Technical Implementation
- Custom URL parser with optimized component extraction
- Zero-copy parsing with minimal allocations
- Parallel processing for bulk operations
- Multi-part TLD support (co.uk, com.au, etc.)
- Template engine with conditional logic and escaping modes
- SQL generation with dialect-specific type mapping
Processing Features
- Field extraction: scheme, username, host, domain, subdomain, port, path, query, fragment
- Data processing: Sort, deduplicate, filter
- Input flexibility: Command line args or stdin
Installation
or clone the repository and build from source:
Quick Start
Extract domain from URL:
# Output: example.com
TSV format with headers:
|
# Output:
# subdomain domain path
# blog example.co.uk /posts
JSON output for APIs:
|
# Output: {"urls":[{"domain":"api.com"}]}
Usage
Input Methods
--urls <URLS>- Specify URLs as command-line arguments- stdin - Pipe URLs from other commands (default if no --urls)
- Supports single or multiple URLs
Options
Core Options
| Option | Values | Description |
|---|---|---|
--format |
plain, tsv, csv, json, jsonl, custom, sql |
Output format (default: plain) |
--fields |
domain,path,url |
Comma-separated fields to extract |
--urls |
URL strings | Input URLs to process |
--header |
- | Include header row for tabular formats |
--sort |
- | Sort output by first field |
--unique |
- | Remove duplicate entries |
Available Fields
| Field | Description | Example |
|---|---|---|
url |
Original URL string | https://www.example.com/path |
scheme |
Protocol | https |
username |
Username portion | user |
host/hostname |
Full hostname | www.example.com |
subdomain |
Subdomain only | www |
domain |
Registrable domain | example.com |
port |
Port number | 8080 |
path |
URL path | /path |
query |
Query parameters | q=search |
fragment |
Fragment identifier | section |
Advanced Options
| Option | Values | Description |
|---|---|---|
--pretty |
- | Pretty-print JSON output |
--strict |
- | Exit code 2 if any URL fails to parse |
--no-newline |
- | Suppress trailing newline |
--null-empty |
Custom string | Value for missing fields (default: \N) |
--color |
auto, never, always |
Colored output for plain format |
Custom Format Options
| Option | Values | Description |
|---|---|---|
--template |
Template string | Custom format template (e.g., '{scheme}://{domain}{path}') |
--escape |
none, shell, csv, json, sql |
Escaping mode for custom format |
SQL Output Options
| Option | Values | Description |
|---|---|---|
--sql-table |
Table name | SQL table name (default: urls) |
--sql-create-table |
- | Include CREATE TABLE statement |
--sql-dialect |
postgres, mysql, sqlite, generic |
SQL dialect for type mapping |
Legacy Field Flags (Still Supported)
These flags automatically add fields - use --fields for explicit control:
| Flag | Equivalent | Description |
|---|---|---|
--domain |
--fields domain |
Extract domain |
--host |
--fields subdomain |
Extract subdomain |
--scheme |
--fields scheme |
Extract scheme |
--path |
--fields path |
Extract path |
Deprecated Options
| Option | Use Instead | Description |
|---|---|---|
--json |
--format json |
JSON output (deprecated) |
--all |
--fields with specific names |
All fields (deprecated) |
--custom |
--format and --fields |
Custom format (deprecated) |
Examples
Most Common Use Cases
1. Extract domains for analysis:
|
# Clean list of unique domains
2. Create a spreadsheet-ready CSV:
# subdomain,domain,path
# api,example.com,/v1/users
3. JSON for APIs and scripts:
|
# {"urls":[{"domain":"api.com","path":"/endpoints"}]}
All Format Examples
Plain (default):
# blog example.com /posts
TSV with header:
|
# subdomain domain path
# api example.com /v1
CSV for spreadsheets:
# url,domain
# https://www.example.com,example.com
JSON for APIs:
|
# {
# "urls": [
# {
# "domain": "example.com",
# "path": "/"
# }
# ]
# }
JSONL for streaming:
| |
# {"domain":"example.com"}
# {"domain":"api.com"}
# {"domain":"blog.net"}
Custom format with templates:
# https://example.com/v1/users
SQL INSERT statements:
# INSERT INTO urls (domain, path) VALUES ('example.com', '/path');
Advanced Examples
Multi-part TLD handling:
# blog example.co.uk /posts
Handle missing values:
|
# example.com N/A
Error handling with strict mode:
# Error: Failed to parse URL: not-a-url
# Exit code: 2
Legacy syntax (still works):
# example.com /
Domain and Subdomain Extraction
rexturl includes intelligent handling for domains and subdomains:
- Multi-part TLD Support: Automatically detects complex TLDs like
co.uk,org.uk,com.au, etc. - Domain Extraction: The
--domainflag extracts the registrable domain name - Subdomain Extraction: When using
--hostalone, it extracts the subdomain portion - Smart Detection: Handles edge cases with nested subdomains and international domains
Supported multi-part TLDs include:
co.uk, org.uk, ac.uk, gov.uk, me.uk, net.uk, sch.uk, com.au, net.au, org.au, edu.au, gov.au, co.nz, net.nz, org.nz, govt.nz, co.za, org.za, com.br, net.br, org.br, co.jp, com.mx, com.ar, com.sg, com.my, co.id, com.hk, co.th, in.th
Examples:
# Using custom format for specific extraction
|
# Output: Subdomain: blog, Domain: example.co.uk
# Extract all components (tab-separated format)
# Output: https user blog.example.co.uk 8080 /posts q=test frag example.co.uk
# Extract components with URLs flag
# Output: example.co.uk
Custom Templates
Template Syntax
Use --format custom --template for flexible output formatting:
Basic fields:
{field}- Insert field value or empty string if missing{field:default}- Insert field value or default if missing{field?text}- Insert text only if field has a value{field!text}- Insert text only if field is missing
Available fields:
{scheme}- URL scheme (http, https, etc.){username}- Username portion of the URL{host}- Full hostname{hostname}- Alias for host{subdomain}- Subdomain portion (e.g., "www" in www.example.com){domain}- Domain name (e.g., "example.com"){port}- Port number{path}- URL path{query}- Query string (without the leading ?){fragment}- Fragment identifier (without the leading #)
Escaping modes:
--escape none- No escaping (default)--escape shell- Shell-safe quoting--escape csv- CSV-compatible escaping--escape json- JSON string escaping--escape sql- SQL value escaping
Template Examples
# Basic template
# Output: Host: example.com, Path: /api
# With defaults
# Output: https://example.com:80
# Conditional text
# Output: example.com&found
# Shell escaping
# Output: 'https://example.com/path with spaces'
SQL Output
Generate SQL INSERT statements from URL data:
# Basic SQL output
# INSERT INTO urls (domain, path) VALUES ('example.com', '/path');
# With CREATE TABLE
# CREATE TABLE IF NOT EXISTS urls (
# id SERIAL PRIMARY KEY,
# domain VARCHAR(253),
# created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP
# );
# INSERT INTO urls (domain) VALUES ('example.com');
# Custom table and dialect
# INSERT INTO my_urls (domain, port) VALUES ('example.com', '3306');
Performance & Architecture
URL Parser Implementation
- Custom URL parser with optimized component extraction
- Zero-copy parsing with minimal memory allocations
- Parallel processing using Rayon for bulk operations
Architecture
- Unified data model: Single
UrlRecordstruct for all formats - Template engine: Flexible custom formatting with conditional logic
- SQL generation: Multi-dialect support with proper type mapping
- Predictable output: Same field order across all formats
- Proper error handling: Exit codes and stderr for failures
- Streaming support: Memory-efficient for large datasets
Benchmarks
# fast_url_parsing time: [823.79 ns 827.53 ns 831.87 ns]
# fast_url_component_access time: [69.100 ns 69.309 ns 69.527 ns]
Technical Details
- Modular design: Separate parsing, formatting, and domain intelligence
- Multi-part TLD support: Handles complex domains like
example.co.uk - Memory efficient: <1KB overhead per URL
Changelog
For a detailed list of changes and version history, see CHANGELOG.md.
Contributing
Contributions are welcome! Please feel free to submit a Pull Request.
- Fork the repository
- Create your feature branch (
git checkout -b feature/amazing-feature) - Make your changes with proper tests
- Ensure all tests pass (
cargo test) - Run formatting and linting (
cargo fmt && cargo clippy) - Commit your changes (
git commit -m 'Add some amazing feature') - Push to the branch (
git push origin feature/amazing-feature) - Open a Pull Request
License
This project is licensed under the MIT License - see the LICENSE file for details.