Summary
tldextract-rs is a high performance effective top level domains (eTLD) extraction module that extracts subcomponents from Domain.
Hostname
- Cargo.toml:
= { = "https://github.com/emo-cat/tldextract-rs" }
- example code
use TLDExtract;
- ExtractResult
Implementation details
Why not split on "." and take the last element instead?
Splitting on "." and taking the last element only works for simple eTLDs like com, but not more complex ones like oseto.nagasaki.jp.
eTLD tries
tldextract-rs stores eTLDs in compressed tries.
Valid eTLDs from the Mozilla Public Suffix List are appended to the compressed trie in reverse-order.
===
The URL host subcomponents are parsed from right-to-left until no more matching nodes can be found. In this example, the path of matching nodes are au -> edu -> nsw. Reversing the nodes gives the extracted eTLD nsw.edu.au.