# SXURL: Slice eXact URL Identifier ("sixerl")
[](https://crates.io/crates/sxurl)
[](https://docs.rs/sxurl)
[](#license)
[](https://github.com/hjpithadia/sxurl/actions)
**SXURL** (pronounced "Sixerl") is a fixed-length, sliceable URL identifier system designed for efficient database storage and querying. It converts URLs into deterministic 256-bit identifiers where each URL component occupies a fixed position, enabling fast substring-based filtering and indexing.
## Features
- **๐ Fixed-length**: All SXURL identifiers are exactly 256 bits (64 hex characters)
- **๐ Sliceable**: Each URL component has a fixed position for substring filtering
- **๐ Deterministic**: Same input always produces the same output
- **๐ก๏ธ Collision-resistant**: Uses SHA-256 hashing for component fingerprinting
- **๐ Standards-compliant**: Supports IDNA, Public Suffix List, and standard URL schemes
- **โก Zero-copy**: Efficient parsing and encoding with minimal allocations
- **๐งช Thoroughly tested**: 100+ comprehensive tests covering edge cases
## Quick Start
Add this to your `Cargo.toml`:
```toml
[dependencies]
sxurl = "0.1"
```
### Basic Usage
```rust
use sxurl::{encode_url_to_hex, decode_hex, matches_component};
// Encode a URL to SXURL
let sxurl_hex = encode_url_to_hex("https://docs.rs/serde")?;
println!("SXURL: {}", sxurl_hex); // 64 hex characters
// Decode and inspect components
let decoded = decode_hex(&sxurl_hex)?;
println!("Scheme: {}", decoded.header.scheme);
println!("Has subdomain: {}", decoded.header.subdomain_present);
// Fast component matching for filtering
let is_docs_rs = matches_component(&sxurl_hex, "domain", "docs")?;
assert!(is_docs_rs);
let is_rs_tld = matches_component(&sxurl_hex, "tld", "rs")?;
assert!(is_rs_tld);
```
### URL Parsing Utilities
```rust
use sxurl::{split_url, parse_query, get_anchor, strip_anchor, join_url_path};
// Parse URL into all components at once
let parts = split_url("https://api.github.com/repos?page=1#readme")?;
println!("Domain: {}", parts.domain); // "github"
println!("Subdomain: {:?}", parts.subdomain); // Some("api")
println!("Anchor: {:?}", parts.anchor); // Some("readme")
// Work with query parameters
let params = parse_query("https://example.com?foo=bar&page=2")?;
println!("Page: {:?}", params.get("page")); // Some("2")
// Handle anchors (fragments)
let anchor = get_anchor("https://docs.rs#search")?; // Some("search")
let clean_url = strip_anchor("https://example.com#top")?; // "https://example.com"
// Join URLs properly
let api_url = join_url_path("https://api.example.com/v1", "users")?;
// Result: "https://api.example.com/v1/users"
```
### Database Integration Example
```rust
use sxurl::encode_url_to_hex;
// Store URLs efficiently in your database
let urls = vec![
"https://docs.rs/serde",
"https://crates.io/crates/tokio",
"https://github.com/rust-lang/rust",
];
for url in urls {
let sxurl = encode_url_to_hex(url)?;
// Store sxurl as CHAR(64) or BINARY(32) in your database
println!("INSERT INTO urls (sxurl, original_url) VALUES ('{}', '{}')", sxurl, url);
}
// Query by domain efficiently using substring matching
// SELECT * FROM urls WHERE sxurl LIKE '_______________example_______________'
// ^domain slice position (chars 7-22)
```
## SXURL Format
The 256-bit SXURL has this fixed layout:
| **header** | [0..3) | 12 | Version, scheme, flags |
| **tld_hash** | [3..7) | 16 | Top-level domain hash |
| **domain** | [7..22) | 60 | Domain name hash |
| **subdomain**| [22..30) | 32 | Subdomain hash |
| **port** | [30..34) | 16 | Port number |
| **path** | [34..49) | 60 | Path hash |
| **params** | [49..58) | 36 | Query parameters hash |
| **fragment** | [58..64) | 24 | Fragment hash |
### Header Format (12 bits)
- **Version** (4 bits): Currently always `1`
- **Scheme** (3 bits): `0`=https, `1`=http, `2`=ftp
- **Flags** (5 bits): Component presence indicators
- Bit 4: Subdomain present
- Bit 3: Query parameters present
- Bit 2: Fragment present
- Bit 1: Non-default port present
- Bit 0: Reserved (always 0)
## Advanced Usage
### Custom Encoder
```rust
use sxurl::SxurlEncoder;
let encoder = SxurlEncoder::new();
// Encode to bytes
let sxurl_bytes = encoder.encode("https://example.com")?;
assert_eq!(sxurl_bytes.len(), 32); // Always 32 bytes
// Encode to hex string
let sxurl_hex = encoder.encode_to_hex("https://example.com")?;
assert_eq!(sxurl_hex.len(), 64); // Always 64 hex chars
```
### Component Filtering
```rust
use sxurl::{encode_url_to_hex, matches_component};
let urls = vec![
"https://api.github.com/repos/rust-lang/rust",
"https://docs.github.com/en/rest",
"https://github.blog/2023-01-01/announcement",
];
// Find all GitHub URLs
for url in &urls {
let sxurl = encode_url_to_hex(url)?;
if matches_component(&sxurl, "domain", "github")? {
println!("GitHub URL: {}", url);
}
}
// Find all API endpoints
for url in &urls {
let sxurl = encode_url_to_hex(url)?;
if matches_component(&sxurl, "subdomain", "api")? {
println!("API URL: {}", url);
}
}
```
### Hash Function Access
```rust
use sxurl::ComponentHasher;
// Access individual hash functions
let tld_hash = ComponentHasher::hash_tld("com")?;
let domain_hash = ComponentHasher::hash_domain("example")?;
let path_hash = ComponentHasher::hash_path("/api/v1/users")?;
println!("TLD hash: 0x{:04x}", tld_hash);
println!("Domain hash: 0x{:015x}", domain_hash);
println!("Path hash: 0x{:015x}", path_hash);
```
## Supported URL Schemes
- **`https`** (scheme code 0) - Default port 443
- **`http`** (scheme code 1) - Default port 80
- **`ftp`** (scheme code 2) - Default port 21
## Use Cases
### Database Indexing
Store URLs as fixed-length identifiers with efficient B-tree indexing:
```sql
CREATE TABLE url_index (
sxurl CHAR(64) PRIMARY KEY,
original_url TEXT NOT NULL,
created_at TIMESTAMP DEFAULT NOW()
);
-- Index by domain (characters 7-22)
CREATE INDEX idx_domain ON url_index (SUBSTRING(sxurl, 8, 15));
-- Index by TLD (characters 3-7)
CREATE INDEX idx_tld ON url_index (SUBSTRING(sxurl, 4, 4));
```
### URL Deduplication
Quickly identify duplicate URLs across different formats:
```rust
use sxurl::encode_url_to_hex;
use std::collections::HashSet;
let mut seen_urls = HashSet::new();
let urls = vec![
"https://Example.Com/Path",
"https://example.com/path", // Same after normalization
"https://example.com/path/", // Different path
];
for url in urls {
let sxurl = encode_url_to_hex(url)?;
if seen_urls.insert(sxurl) {
println!("New URL: {}", url);
} else {
println!("Duplicate URL: {}", url);
}
}
```
### Fast Domain Filtering
Filter large URL datasets by domain without parsing:
```rust
// Filter millions of URLs by domain using simple string operations
let domain_filter = "1a2b3c4d5e6f7890abc"; // Hash of target domain
let urls_in_domain: Vec<_> = all_sxurls
.iter()
.filter(|sxurl| &sxurl[7..22] == domain_filter)
.collect();
```
## Error Handling
All functions return `Result<T, SxurlError>`. Common error cases:
```rust
use sxurl::{encode_url_to_hex, SxurlError};
match encode_url_to_hex("invalid-url") {
Ok(sxurl) => println!("Success: {}", sxurl),
Err(SxurlError::InvalidScheme) => println!("Unsupported URL scheme"),
Err(SxurlError::HostNotDns) => println!("Host must be a DNS name, not IP"),
Err(SxurlError::InvalidLength) => println!("Invalid SXURL format"),
Err(e) => println!("Other error: {}", e),
}
```
## Performance
SXURL is designed for high-performance applications:
- **Encoding**: ~1-5 ฮผs per URL (depending on complexity)
- **Decoding**: ~100-500 ns per SXURL
- **Component matching**: ~50-100 ns per comparison
- **Memory usage**: Fixed 32 bytes per SXURL, minimal temporary allocations
Benchmarks on a modern CPU:
```
encode_simple_url time: 1.2 ฮผs
encode_complex_url time: 4.8 ฮผs
decode_sxurl time: 245 ns
component_match time: 67 ns
```
## Technical Details
### Hash Function
SXURL uses labeled SHA-256 hashing: `H_n(label, data) = lower_n(SHA256(label || 0x00 || data))`
- Collision resistance: ~2^(n/2) where n is the bit width
- Domain separation: Different labels produce different hashes for same data
- Deterministic: Same input always produces same output
### URL Normalization
- Scheme and host converted to lowercase
- IDNA (Internationalized Domain Names) support
- Public Suffix List (PSL) for proper domain/TLD splitting
- IP addresses rejected (DNS names only)
### Component Handling
- Empty components stored as zero (not hashed)
- Default ports (80, 443, 21) stored explicitly
- Query parameters and fragments preserved as-is
- Path normalization preserves original structure
## Testing
Run the comprehensive test suite:
```bash
cargo test # Run all tests
cargo test --doc # Test documentation examples
cargo test --release # Test optimized builds
cargo bench # Run performance benchmarks
```
The library includes 100+ tests covering:
- Specification compliance
- Edge cases and error conditions
- Round-trip consistency
- Hash collision resistance
- Performance regression testing
## Contributing
Contributions are welcome! Please:
1. Fork the repository
2. Create a feature branch
3. Add tests for new functionality
4. Ensure all tests pass
5. Submit a pull request
## Specification
SXURL follows a formal specification available in [SXURL-SPEC.md](SXURL-SPEC.md). The implementation is designed to be compatible with other SXURL implementations in different languages.
## Changelog
See [CHANGELOG.md](CHANGELOG.md) for detailed release history.
## License
Licensed under the Apache License, Version 2.0 ([LICENSE](LICENSE) or http://www.apache.org/licenses/LICENSE-2.0).
### Contribution
Unless you explicitly state otherwise, any contribution intentionally submitted for inclusion in the work by you, as defined in the Apache-2.0 license, shall be licensed under the Apache License, Version 2.0, without any additional terms or conditions.
---
**SXURL** - Efficient, deterministic URL identifiers for modern applications.