Summary
tldextract-rs is a high performance effective top level domains (eTLD) extraction module that extracts subcomponents from Domain.
Hostname
- Cargo.toml:
= { = "https://github.com/emo-cat/tldextract-rs" }
- example code
use TLDExtract;
- ExtractResult
Implementation details
Why not split on "." and take the last element instead?
Splitting on "." and taking the last element only works for simple eTLDs like com
, but not more complex ones like oseto.nagasaki.jp
.
eTLD tries
tldextract-rs stores eTLDs in compressed tries.
Valid eTLDs from the Mozilla Public Suffix List are appended to the compressed trie in reverse-order.
===
The URL host subcomponents are parsed from right-to-left until no more matching nodes can be found. In this example, the path of matching nodes are au -> edu -> nsw
. Reversing the nodes gives the extracted eTLD nsw.edu.au
.