SXURL: Slice eXact URL Identifier ("sixerl")
SXURL (pronounced "Sixerl") is a fixed-length, sliceable URL identifier system designed for efficient database storage and querying. It converts URLs into deterministic 256-bit identifiers where each URL component occupies a fixed position, enabling fast substring-based filtering and indexing.
Features
- ๐ Fixed-length: All SXURL identifiers are exactly 256 bits (64 hex characters)
- ๐ Sliceable: Each URL component has a fixed position for substring filtering
- ๐ Deterministic: Same input always produces the same output
- ๐ก๏ธ Collision-resistant: Uses SHA-256 hashing for component fingerprinting
- ๐ Standards-compliant: Supports IDNA, Public Suffix List, and standard URL schemes
- โก Zero-copy: Efficient parsing and encoding with minimal allocations
- ๐งช Thoroughly tested: 100+ comprehensive tests covering edge cases
Quick Start
Add this to your Cargo.toml:
[]
= "0.1"
Basic Usage
use ;
// Encode a URL to SXURL
let sxurl_hex = encode_url_to_hex?;
println!; // 64 hex characters
// Decode and inspect components
let decoded = decode_hex?;
println!;
println!;
// Fast component matching for filtering
let is_docs_rs = matches_component?;
assert!;
let is_rs_tld = matches_component?;
assert!;
URL Parsing Utilities
use ;
// Parse URL into all components at once
let parts = split_url?;
println!; // "github"
println!; // Some("api")
println!; // Some("readme")
// Work with query parameters
let params = parse_query?;
println!; // Some("2")
// Handle anchors (fragments)
let anchor = get_anchor?; // Some("search")
let clean_url = strip_anchor?; // "https://example.com"
// Join URLs properly
let api_url = join_url_path?;
// Result: "https://api.example.com/v1/users"
Database Integration Example
use encode_url_to_hex;
// Store URLs efficiently in your database
let urls = vec!;
for url in urls
// Query by domain efficiently using substring matching
// SELECT * FROM urls WHERE sxurl LIKE '_______________example_______________'
// ^domain slice position (chars 7-22)
SXURL Format
The 256-bit SXURL has this fixed layout:
| Component | Hex Range | Bits | Description |
|---|---|---|---|
| header | [0..3) | 12 | Version, scheme, flags |
| tld_hash | [3..7) | 16 | Top-level domain hash |
| domain | [7..22) | 60 | Domain name hash |
| subdomain | [22..30) | 32 | Subdomain hash |
| port | [30..34) | 16 | Port number |
| path | [34..49) | 60 | Path hash |
| params | [49..58) | 36 | Query parameters hash |
| fragment | [58..64) | 24 | Fragment hash |
Header Format (12 bits)
- Version (4 bits): Currently always
1 - Scheme (3 bits):
0=https,1=http,2=ftp - Flags (5 bits): Component presence indicators
- Bit 4: Subdomain present
- Bit 3: Query parameters present
- Bit 2: Fragment present
- Bit 1: Non-default port present
- Bit 0: Reserved (always 0)
Advanced Usage
Custom Encoder
use SxurlEncoder;
let encoder = new;
// Encode to bytes
let sxurl_bytes = encoder.encode?;
assert_eq!; // Always 32 bytes
// Encode to hex string
let sxurl_hex = encoder.encode_to_hex?;
assert_eq!; // Always 64 hex chars
Component Filtering
use ;
let urls = vec!;
// Find all GitHub URLs
for url in &urls
// Find all API endpoints
for url in &urls
Hash Function Access
use ComponentHasher;
// Access individual hash functions
let tld_hash = hash_tld?;
let domain_hash = hash_domain?;
let path_hash = hash_path?;
println!;
println!;
println!;
Supported URL Schemes
https(scheme code 0) - Default port 443http(scheme code 1) - Default port 80ftp(scheme code 2) - Default port 21
Use Cases
Database Indexing
Store URLs as fixed-length identifiers with efficient B-tree indexing:
(
sxurl CHAR(64) PRIMARY KEY,
original_url TEXT NOT NULL,
created_at TIMESTAMP DEFAULT NOW
);
-- Index by domain (characters 7-22)
(SUBSTRING(sxurl, 8, 15));
-- Index by TLD (characters 3-7)
(SUBSTRING(sxurl, 4, 4));
URL Deduplication
Quickly identify duplicate URLs across different formats:
use encode_url_to_hex;
use HashSet;
let mut seen_urls = new;
let urls = vec!;
for url in urls
Fast Domain Filtering
Filter large URL datasets by domain without parsing:
// Filter millions of URLs by domain using simple string operations
let domain_filter = "1a2b3c4d5e6f7890abc"; // Hash of target domain
let urls_in_domain: = all_sxurls
.iter
.filter
.collect;
Error Handling
All functions return Result<T, SxurlError>. Common error cases:
use ;
match encode_url_to_hex
Performance
SXURL is designed for high-performance applications:
- Encoding: ~1-5 ฮผs per URL (depending on complexity)
- Decoding: ~100-500 ns per SXURL
- Component matching: ~50-100 ns per comparison
- Memory usage: Fixed 32 bytes per SXURL, minimal temporary allocations
Benchmarks on a modern CPU:
encode_simple_url time: 1.2 ฮผs
encode_complex_url time: 4.8 ฮผs
decode_sxurl time: 245 ns
component_match time: 67 ns
Technical Details
Hash Function
SXURL uses labeled SHA-256 hashing: H_n(label, data) = lower_n(SHA256(label || 0x00 || data))
- Collision resistance: ~2^(n/2) where n is the bit width
- Domain separation: Different labels produce different hashes for same data
- Deterministic: Same input always produces same output
URL Normalization
- Scheme and host converted to lowercase
- IDNA (Internationalized Domain Names) support
- Public Suffix List (PSL) for proper domain/TLD splitting
- IP addresses rejected (DNS names only)
Component Handling
- Empty components stored as zero (not hashed)
- Default ports (80, 443, 21) stored explicitly
- Query parameters and fragments preserved as-is
- Path normalization preserves original structure
Testing
Run the comprehensive test suite:
The library includes 100+ tests covering:
- Specification compliance
- Edge cases and error conditions
- Round-trip consistency
- Hash collision resistance
- Performance regression testing
Contributing
Contributions are welcome! Please:
- Fork the repository
- Create a feature branch
- Add tests for new functionality
- Ensure all tests pass
- Submit a pull request
Specification
SXURL follows a formal specification available in SXURL-SPEC.md. The implementation is designed to be compatible with other SXURL implementations in different languages.
Changelog
See CHANGELOG.md for detailed release history.
License
Licensed under the Apache License, Version 2.0 (LICENSE or http://www.apache.org/licenses/LICENSE-2.0).
Contribution
Unless you explicitly state otherwise, any contribution intentionally submitted for inclusion in the work by you, as defined in the Apache-2.0 license, shall be licensed under the Apache License, Version 2.0, without any additional terms or conditions.
SXURL - Efficient, deterministic URL identifiers for modern applications.