Crate sxurl

Crate sxurl 

Source
Expand description

SXURL - Fixed-length, sliceable URL identifier system

“Sixerl” is a fixed-length, sliceable URL identifier system for efficient database storage and querying.

This crate provides a way to convert URLs into fixed 256-bit identifiers where each URL component occupies a fixed position in the resulting hex string.

§Features

  • Fixed-length: All SXURL identifiers are exactly 256 bits (64 hex characters)
  • Sliceable: Each URL component has a fixed position for substring filtering
  • Deterministic: Same input always produces the same output
  • Collision-resistant: Uses SHA-256 hashing for component fingerprinting
  • Standards-compliant: Supports IDNA, Public Suffix List, and standard URL schemes

§Quick Start

use sxurl::{encode_url_to_hex, decode_hex, matches_component, split_url, parse_query};

// Encode a URL to SXURL
let sxurl_hex = encode_url_to_hex("https://docs.rs/sxurl")?;
println!("SXURL: {}", sxurl_hex); // 64 hex characters

// Decode and inspect
let decoded = decode_hex(&sxurl_hex)?;
println!("Scheme: {}", decoded.header.scheme);

// Filter by component
let is_rs_tld = matches_component(&sxurl_hex, "tld", "rs")?;
assert!(is_rs_tld);

// Parse URL into components
let parts = split_url("https://api.github.com/repos?page=1#readme")?;
println!("Domain: {}, Subdomain: {:?}", parts.domain, parts.subdomain);
println!("Anchor: {:?}", parts.anchor);

// Work with query parameters
let params = parse_query("https://example.com?foo=bar&page=2")?;
println!("Page: {:?}", params.get("page"));

§SXURL Format

The 256-bit SXURL has this fixed layout:

ComponentHex RangeBitsDescription
header[0..3)12Version, scheme, flags
tld_hash[3..7)16Top-level domain hash
domain_hash[7..22)60Domain name hash
sub_hash[22..30)32Subdomain hash
port[30..34)16Port number
path_hash[34..49)60Path hash
params_hash[49..58)36Query parameters hash
frag_hash[58..64)24Fragment hash

§Supported URL Schemes

  • https (scheme code 0)
  • http (scheme code 1)
  • ftp (scheme code 2)

§Error Handling

All functions return Result<T, SxurlError>. Common error cases:

  • Unsupported URL schemes (only https, http, ftp supported)
  • Invalid hostnames or IP addresses
  • Malformed URLs or SXURL hex strings

Re-exports§

pub use core::encode_url;
pub use core::encode_url_to_hex;
pub use core::encode_host_to_hex;
pub use core::extract_host_from_full_sxurl;
pub use core::extract_host_from_full_bytes;
pub use core::SxurlEncoder;
pub use core::decode_hex;
pub use core::decode_bytes;
pub use core::matches_component;
pub use core::DecodedSxurl;
pub use url::split_url;
pub use url::split_domain;
pub use url::get_path_segments;
pub use url::get_filename;
pub use url::parse_query;
pub use url::get_query_value;
pub use url::get_anchor;
pub use url::strip_anchor;
pub use url::join_url_path;
pub use url::is_https;
pub use url::has_query;
pub use url::has_anchor;
pub use url::UrlParts;
pub use url::get_url_component;
pub use url::strip_url_component;
pub use error::SxurlError;
pub use types::SxurlHeader;
pub use types::UrlComponents;
pub use types::UrlComponentType;
pub use core::hash_component;
pub use core::ComponentHasher;
pub use core::extract_lower_bits;
pub use url::normalize_url;
pub use url::normalize_host;
pub use url::validate_host;
pub use url::split_host_with_psl;
pub use url::extract_url_components;
pub use core::pack_sxurl;
pub use core::sxurl_to_hex;
pub use core::hex_to_sxurl;
pub use core::sxurl_contains;
pub use core::sxurl_has_domain;
pub use core::sxurl_has_subdomain;
pub use core::sxurl_has_tld;

Modules§

core
Core SXURL encoding and decoding functionality.
error
Error types for SXURL encoding and decoding operations.
types
Core data structures for SXURL encoding and decoding.
url
URL processing and manipulation utilities.