webrisk_hash 0.1.0

URL canonicalization and hashing for Google Web Risk API
Documentation
  • Coverage
  • 83.33%
    5 out of 6 items documented4 out of 5 items with examples
  • Size
  • Source code size: 44.98 kB This is the summed size of all the files inside the crates.io package for this release.
  • Documentation size: 1.6 MB This is the summed size of all files generated by rustdoc for all configured targets
  • Ø build duration
  • this release: 25s Average build duration of successful builds.
  • all releases: 25s Average build duration of successful builds in releases after 2024-10-23.
  • Links
  • ydakuka/webrisk_hash-rs
    0 0 0
  • crates.io
  • Dependencies
  • Versions
  • Owners
  • ydakuka

webrisk_hash

URL canonicalization and hashing for the Google Web Risk API.

Implements the full canonicalization spec: percent-encoding normalization, IP address parsing (decimal/hex/octal), hostname normalization, path resolution, and suffix/prefix expression generation with SHA-256 hash prefixes.

Installation

cargo add webrisk_hash

Usage

Canonicalize a URL

use webrisk_hash::canonicalize;

let url = canonicalize("http://www.GOOgle.com/foo/../bar");
assert_eq!(url, Some("http://www.google.com/bar".to_string()));

// Integer IP normalization
let url = canonicalize("http://3279880203/blah");
assert_eq!(url, Some("http://195.127.0.11/blah".to_string()));

// Returns None for invalid URLs
assert_eq!(canonicalize(""), None);

Generate suffix/prefix expressions

use webrisk_hash::suffix_postfix_expressions;

let exprs = suffix_postfix_expressions("http://a.b.c/1/2.html?param=1");
assert!(exprs.contains(&"a.b.c/1/2.html?param=1".to_string()));
assert!(exprs.contains(&"b.c/".to_string()));

Compute hash prefixes

use webrisk_hash::{get_prefixes, truncated_sha256_prefix};

// Get 32-bit hash prefixes for all expressions of a URL
let prefixes = get_prefixes("https://example.com/path", 32);

// Or hash a single string
let hash = truncated_sha256_prefix("abc", 32);
assert_eq!(hash, vec![0xba, 0x78, 0x16, 0xbf]);

End-to-end: URL to hash prefix set

use webrisk_hash::get_prefixes;

let prefixes = get_prefixes("https://google.com/a/test/index.html?abc123", 32);
assert_eq!(prefixes.len(), 5);

API

Function Description
canonicalize(url) Canonicalize a URL per the Web Risk spec. Returns Option<String>.
suffix_postfix_expressions(url) Generate up to 30 host suffix / path prefix combinations.
truncated_sha256_prefix(s, bits) SHA-256 hash truncated to bits/8 bytes (max 32).
get_prefixes(url, bits) Canonicalize + expressions + hash. Returns HashSet<Vec<u8>>.
get_prefix_map(url, bits) Like get_prefixes but returns Vec<(expression, hash)>.

Limitations

  • Accepts &str input only (valid UTF-8). Raw &[u8] byte sequences are not supported.
  • URLs longer than 8192 bytes return None.
  • Hostnames longer than 255 characters return None.

License

MIT