# rexturl
[](https://github.com/vschwaberow/rexturl/releases/tag/v0.4.1)
[](https://opensource.org/licenses/MIT)
[](https://www.rust-lang.org/)
A command-line tool for parsing and manipulating URLs with predictable output formats.
## Key Features
### Clean UX Design
- One flag controls format: `--format {plain,tsv,csv,json,jsonl,custom,sql}`
- Precise field selection: `--fields domain,path,url`
- Custom templates: `--template '{scheme}://{domain}{path}'`
- SQL generation: Multi-dialect INSERT statements with proper escaping
- Consistent output: Same field order across all formats
- Machine-friendly: Proper headers, null handling, exit codes
### Technical Implementation
- Custom URL parser with optimized component extraction
- Zero-copy parsing with minimal allocations
- Parallel processing for bulk operations
- Multi-part TLD support (co.uk, com.au, etc.)
- Template engine with conditional logic and escaping modes
- SQL generation with dialect-specific type mapping
### Processing Features
- Field extraction: scheme, username, host, domain, subdomain, port, path, query, fragment
- Data processing: Sort, deduplicate, filter
- Input flexibility: Command line args or stdin
## Installation
```bash
cargo install rexturl
```
or clone the repository and build from source:
```bash
git clone https://github.com/vschwaberow/rexturl.git
cd rexturl
cargo build --release
```
## Quick Start
**Extract domain from URL:**
```bash
rexturl --urls "https://www.example.com/path" --fields domain
# Output: example.com
```
**TSV format with headers:**
```bash
# subdomain domain path
# blog example.co.uk /posts
```
**JSON output for APIs:**
```bash
```
## Usage
```bash
rexturl [OPTIONS]
```
### Input Methods
- `--urls <URLS>` - Specify URLs as command-line arguments
- **stdin** - Pipe URLs from other commands (default if no --urls)
- Supports single or multiple URLs
## Options
### Core Options
| `--format` | `plain`, `tsv`, `csv`, `json`, `jsonl`, `custom`, `sql` | Output format (default: `plain`) |
| `--fields` | `domain,path,url` | Comma-separated fields to extract |
| `--urls` | URL strings | Input URLs to process |
| `--header` | - | Include header row for tabular formats |
| `--sort` | - | Sort output by first field |
| `--unique` | - | Remove duplicate entries |
### Available Fields
| `url` | Original URL string | `https://www.example.com/path` |
| `scheme` | Protocol | `https` |
| `username` | Username portion | `user` |
| `host`/`hostname` | Full hostname | `www.example.com` |
| `subdomain` | Subdomain only | `www` |
| `domain` | Registrable domain | `example.com` |
| `port` | Port number | `8080` |
| `path` | URL path | `/path` |
| `query` | Query parameters | `q=search` |
| `fragment` | Fragment identifier | `section` |
### Advanced Options
| `--pretty` | - | Pretty-print JSON output |
| `--strict` | - | Exit code 2 if any URL fails to parse |
| `--no-newline` | - | Suppress trailing newline |
| `--null-empty` | Custom string | Value for missing fields (default: `\N`) |
| `--color` | `auto`, `never`, `always` | Colored output for plain format |
### Custom Format Options
| `--template` | Template string | Custom format template (e.g., `'{scheme}://{domain}{path}'`) |
| `--escape` | `none`, `shell`, `csv`, `json`, `sql` | Escaping mode for custom format |
### SQL Output Options
| `--sql-table` | Table name | SQL table name (default: `urls`) |
| `--sql-create-table` | - | Include CREATE TABLE statement |
| `--sql-dialect` | `postgres`, `mysql`, `sqlite`, `generic` | SQL dialect for type mapping |
### Legacy Field Flags (Still Supported)
These flags automatically add fields - use `--fields` for explicit control:
| `--domain` | `--fields domain` | Extract domain |
| `--host` | `--fields subdomain` | Extract subdomain |
| `--scheme` | `--fields scheme` | Extract scheme |
| `--path` | `--fields path` | Extract path |
### Deprecated Options
| `--json` | `--format json` | JSON output (deprecated) |
| `--all` | `--fields` with specific names | All fields (deprecated) |
| `--custom` | `--format` and `--fields` | Custom format (deprecated) |
## Examples
### Most Common Use Cases
**1. Extract domains for analysis:**
```bash
```
**2. Create a spreadsheet-ready CSV:**
```bash
rexturl --urls "https://api.example.com/v1/users" --fields subdomain,domain,path --format csv --header
# subdomain,domain,path
# api,example.com,/v1/users
```
**3. JSON for APIs and scripts:**
```bash
```
### All Format Examples
**Plain (default):**
```bash
rexturl --urls "https://blog.example.com/posts" --fields subdomain,domain,path
# blog example.com /posts
```
**TSV with header:**
```bash
# api example.com /v1
```
**CSV for spreadsheets:**
```bash
rexturl --fields url,domain --format csv --header < urls.txt
# url,domain
# https://www.example.com,example.com
```
**JSON for APIs:**
```bash
# "urls": [
# {
# "domain": "example.com",
# "path": "/"
# }
# ]
# }
```
**JSONL for streaming:**
```bash
# {"domain":"api.com"}
# {"domain":"blog.net"}
```
**Custom format with templates:**
```bash
rexturl --urls "https://api.example.com/v1/users" --format custom --template "{scheme}://{domain}{path}"
# https://example.com/v1/users
```
**SQL INSERT statements:**
```bash
rexturl --urls "https://www.example.com/path" --format sql --fields domain,path
# INSERT INTO urls (domain, path) VALUES ('example.com', '/path');
```
### Advanced Examples
**Multi-part TLD handling:**
```bash
rexturl --urls "https://blog.example.co.uk/posts" --fields subdomain,domain,path --format tsv
# blog example.co.uk /posts
```
**Handle missing values:**
```bash
```
**Error handling with strict mode:**
```bash
rexturl --urls "not-a-url" --strict --fields domain
# Error: Failed to parse URL: not-a-url
# Exit code: 2
```
**Legacy syntax (still works):**
```bash
rexturl --urls "https://www.example.com" --domain --path
# example.com /
```
## Domain and Subdomain Extraction
`rexturl` includes intelligent handling for domains and subdomains:
- **Multi-part TLD Support**: Automatically detects complex TLDs like `co.uk`, `org.uk`, `com.au`, etc.
- **Domain Extraction**: The `--domain` flag extracts the registrable domain name
- **Subdomain Extraction**: When using `--host` alone, it extracts the subdomain portion
- **Smart Detection**: Handles edge cases with nested subdomains and international domains
Supported multi-part TLDs include:
`co.uk`, `org.uk`, `ac.uk`, `gov.uk`, `me.uk`, `net.uk`, `sch.uk`, `com.au`, `net.au`, `org.au`, `edu.au`, `gov.au`, `co.nz`, `net.nz`, `org.nz`, `govt.nz`, `co.za`, `org.za`, `com.br`, `net.br`, `org.br`, `co.jp`, `com.mx`, `com.ar`, `com.sg`, `com.my`, `co.id`, `com.hk`, `co.th`, `in.th`
Examples:
```bash
# Using custom format for specific extraction
# Extract all components (tab-separated format)
rexturl --urls "https://user@blog.example.co.uk:8080/posts?q=test#frag" --fields scheme,username,hostname,port,path,query,fragment,domain --format tsv
# Output: https user blog.example.co.uk 8080 /posts q=test frag example.co.uk
# Extract components with URLs flag
rexturl --urls "https://blog.example.co.uk/posts" --fields domain
# Output: example.co.uk
```
## Custom Templates
### Template Syntax
Use `--format custom --template` for flexible output formatting:
**Basic fields:**
- `{field}` - Insert field value or empty string if missing
- `{field:default}` - Insert field value or default if missing
- `{field?text}` - Insert text only if field has a value
- `{field!text}` - Insert text only if field is missing
**Available fields:**
- `{scheme}` - URL scheme (http, https, etc.)
- `{username}` - Username portion of the URL
- `{host}` - Full hostname
- `{hostname}` - Alias for host
- `{subdomain}` - Subdomain portion (e.g., "www" in www.example.com)
- `{domain}` - Domain name (e.g., "example.com")
- `{port}` - Port number
- `{path}` - URL path
- `{query}` - Query string (without the leading ?)
- `{fragment}` - Fragment identifier (without the leading #)
**Escaping modes:**
- `--escape none` - No escaping (default)
- `--escape shell` - Shell-safe quoting
- `--escape csv` - CSV-compatible escaping
- `--escape json` - JSON string escaping
- `--escape sql` - SQL value escaping
### Template Examples
```bash
# Basic template
rexturl --urls "https://example.com/api" --format custom --template "Host: {host}, Path: {path}"
# Output: Host: example.com, Path: /api
# With defaults
rexturl --urls "https://example.com" --format custom --template "{scheme}://{domain}:{port:80}"
# Output: https://example.com:80
# Conditional text
rexturl --urls "https://example.com/path?q=test" --format custom --template "{domain}{query?&found}"
# Output: example.com&found
# Shell escaping
rexturl --urls "https://example.com/path with spaces" --format custom --template "{url}" --escape shell
# Output: 'https://example.com/path with spaces'
```
## SQL Output
Generate SQL INSERT statements from URL data:
```bash
# Basic SQL output
rexturl --urls "https://www.example.com/path" --format sql --fields domain,path
# INSERT INTO urls (domain, path) VALUES ('example.com', '/path');
# With CREATE TABLE
rexturl --urls "https://example.com" --format sql --fields domain --sql-create-table
# CREATE TABLE IF NOT EXISTS urls (
# id SERIAL PRIMARY KEY,
# domain VARCHAR(253),
# created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP
# );
# INSERT INTO urls (domain) VALUES ('example.com');
# Custom table and dialect
rexturl --urls "https://example.com:3306" --format sql --fields domain,port --sql-table my_urls --sql-dialect mysql
# INSERT INTO my_urls (domain, port) VALUES ('example.com', '3306');
```
## Performance & Architecture
### URL Parser Implementation
- Custom URL parser with optimized component extraction
- Zero-copy parsing with minimal memory allocations
- Parallel processing using Rayon for bulk operations
### Architecture
- Unified data model: Single `UrlRecord` struct for all formats
- Template engine: Flexible custom formatting with conditional logic
- SQL generation: Multi-dialect support with proper type mapping
- Predictable output: Same field order across all formats
- Proper error handling: Exit codes and stderr for failures
- Streaming support: Memory-efficient for large datasets
### Benchmarks
```bash
cargo bench
# fast_url_parsing time: [823.79 ns 827.53 ns 831.87 ns]
# fast_url_component_access time: [69.100 ns 69.309 ns 69.527 ns]
```
### Technical Details
- Modular design: Separate parsing, formatting, and domain intelligence
- Multi-part TLD support: Handles complex domains like `example.co.uk`
- Memory efficient: <1KB overhead per URL
## Changelog
For a detailed list of changes and version history, see [CHANGELOG.md](CHANGELOG.md).
## Contributing
Contributions are welcome! Please feel free to submit a Pull Request.
1. Fork the repository
2. Create your feature branch (`git checkout -b feature/amazing-feature`)
3. Make your changes with proper tests
4. Ensure all tests pass (`cargo test`)
5. Run formatting and linting (`cargo fmt && cargo clippy`)
6. Commit your changes (`git commit -m 'Add some amazing feature'`)
7. Push to the branch (`git push origin feature/amazing-feature`)
8. Open a Pull Request
## License
This project is licensed under the MIT License - see the [LICENSE](LICENSE) file for details.