Expand description
§urlexpand
A small library for expanding (“unshortening”) shortened URLs into their final destination.
The crate is designed around resolver modules, where each resolver knows how to expand one (or a family) of shortener services—especially the ones that don’t rely purely on HTTP 3xx redirects and instead use HTML/JS-based redirect pages.
§Goals
- Fast, reliable expansion for common shorteners
- Extensible resolver structure (add a new module for a new shortener)
- Non-JS resolution (no headless browser) using a mix of redirect following + parsing + API lookups
- Consistent
Result<T>/Errorhandling across resolvers - Unified API with both async and blocking support via feature flags
§How it works (high level)
- The caller provides a URL (potentially shortened).
- The library picks a resolver (or tries several in order).
- The resolver expands the URL using one of these strategies:
- HTTP redirect following (3xx chains)
- HTML pattern extraction (regex-based “click-through” / meta / JS hints)
- Service API lookup (when the browser normally uses JS to fetch the destination)
- The final URL is returned as a
String.
§API Usage
The library provides a unified unshorten() function that adapts based on feature flags:
§Default (async-only)
use urlexpand::unshorten;
use std::time::Duration;
let final_url = unshorten("https://bit.ly/3alqLKi", Some(Duration::from_secs(10))).await?;§With blocking feature
// Add to Cargo.toml: urlexpand = { version = "...", features = ["blocking"] }
use urlexpand::unshorten;
use std::time::Duration;
// Blocking version
let final_url = unshorten("https://bit.ly/3alqLKi", Some(Duration::from_secs(10)))?;
// Async version (still available when blocking feature is enabled)
let final_url = unshorten_async("https://bit.ly/3alqLKi", Some(Duration::from_secs(10))).await?;§Module layout
A common structure looks like this:
src/lib.rs- exports
ResultandError - exports the public expansion API
- exports
src/error.rs- defines
Errorand error conversions (e.g.From<reqwest::Error>)
- defines
src/resolvers/- each file is a shortener-specific resolver (e.g.
tinyurl.rs,urlshortdev.rs, etc.)
- each file is a shortener-specific resolver (e.g.
src/resolvers/mod.rs- re-exports resolver functions and common helper utilities
§Common helper utilities
Many resolver modules share helpers such as:
get_client_builder(timeout)— returns a configuredreqwest::ClientBuilderfrom_re(text, pattern)— returns the first capture group match asOption<String>
These helpers keep each resolver tiny and consistent.
§Error handling model
Resolvers generally return:
Ok(final_url)on successErr(Error::NoString)when a redirect page/API response doesn’t contain a destination URLErr(Error::...)for network/HTTP/parse errors
To make resolver modules ergonomic, it’s recommended that Error implements:
From<reqwest::Error>- (optionally)
From<std::io::Error>or other error conversions you use
That lets resolvers freely use ? or .map_err(Error::from).
§Timeouts and redirect limits
Timeouts are typically passed into each resolver (Option<Duration>) and applied via the shared
HTTP client builder. Redirect limits should also be configured in one place (your builder) so all
resolvers behave consistently.
§Adding a new resolver
- Create
src/resolvers/<service>.rs - Implement:
pub(crate) async fn unshort(url: &str, timeout: Option<std::time::Duration>) -> crate::Result<String> {
// resolve & return final URL
}- Re-export it from
src/resolvers/mod.rs - Add it to your dispatcher/registry if you have one (e.g., “try resolvers in order”).
§Resolver style guideline
Keep resolvers small and focused:
- follow redirects first
- if the service stops on a non-redirect “intermediate page”, use either:
- regex extraction (
from_re) or - a small API call if the browser normally uses JS
- regex extraction (
§Testing
For deterministic tests, consider:
- unit testing regex extraction helpers (
from_re) with fixed strings - using a mock HTTP server (or recorded fixtures) for network calls
- keeping “live” integration tests behind a feature flag, since shortener behavior can change
§Security considerations
Expanding URLs can lead to untrusted destinations. Consider optional safeguards:
- maximum redirect depth
- domain allow/deny lists
- blocking private IP ranges (SSRF protection) if this runs server-side
- request method restrictions (typically GET only)
- size limits for downloaded bodies when parsing HTML
Functions§
- is_
shortened - Check to see if a given url is a shortened url
- unshorten
- UnShorten a shortened URL (blocking version)
- unshorten_
async - UnShorten a shortened URL (async version, available when blocking feature is enabled)