webrisk_hash
URL canonicalization and hashing for the Google Web Risk API.
Implements the full canonicalization spec: percent-encoding normalization, IP address parsing (decimal/hex/octal), hostname normalization, path resolution, and suffix/prefix expression generation with SHA-256 hash prefixes.
Installation
Usage
Canonicalize a URL
use canonicalize;
let url = canonicalize;
assert_eq!;
// Integer IP normalization
let url = canonicalize;
assert_eq!;
// Returns None for invalid URLs
assert_eq!;
Generate suffix/prefix expressions
use suffix_postfix_expressions;
let exprs = suffix_postfix_expressions;
assert!;
assert!;
Compute hash prefixes
use ;
// Get 32-bit hash prefixes for all expressions of a URL
let prefixes = get_prefixes;
// Or hash a single string
let hash = truncated_sha256_prefix;
assert_eq!;
End-to-end: URL to hash prefix set
use get_prefixes;
let prefixes = get_prefixes;
assert_eq!;
API
| Function | Description |
|---|---|
canonicalize(url) |
Canonicalize a URL per the Web Risk spec. Returns Option<String>. |
suffix_postfix_expressions(url) |
Generate up to 30 host suffix / path prefix combinations. |
truncated_sha256_prefix(s, bits) |
SHA-256 hash truncated to bits/8 bytes (max 32). |
get_prefixes(url, bits) |
Canonicalize + expressions + hash. Returns HashSet<Vec<u8>>. |
get_prefix_map(url, bits) |
Like get_prefixes but returns Vec<(expression, hash)>. |
Limitations
- Accepts
&strinput only (valid UTF-8). Raw&[u8]byte sequences are not supported. - URLs longer than 8192 bytes return
None. - Hostnames longer than 255 characters return
None.
License
MIT