## Summary
**tldextract-rs** is a high performance [effective top level domains (eTLD)](https://wiki.mozilla.org/Public_Suffix_List) extraction module that extracts subcomponents from Domain.
- Using
```bash
Usage: tldextract-cli [-j] [-t <target>] [-i] [--disable-private-domains]
Reach new heights.
Options:
-j, --json print format json
-t, --target target
-i, --interactive interactive mode
--disable-private-domains
disable private domains
--help display usage information
```
- example
```bash
➜ tldextract-rs git:(main) ✗ tldextract-cli -j -t mirrors.tuna.tsinghua.edu.cn
{
"subdomain": "mirrors.tuna",
"domain": "tsinghua",
"suffix": "edu.cn",
"registered_domain": "tsinghua.edu.cn"
}
```
## Implementation details
### Why not split on "." and take the last element instead?
Splitting on "." and taking the last element only works for simple eTLDs like `com`, but not more complex ones like `oseto.nagasaki.jp`.
### eTLD tries
**tldextract-rs** stores eTLDs in [compressed tries](https://en.wikipedia.org/wiki/Trie).
Valid eTLDs from the [Mozilla Public Suffix List](http://www.publicsuffix.org) are appended to the compressed trie in reverse-order.
```sh
Given the following eTLDs
au
nsw.edu.au
com.ac
edu.ac
gov.ac
and the example URL host `example.nsw.edu.au`
The compressed trie will be structured as follows:
START
╠═ au 🚩 ✅
║ ╚═ edu ✅
║ ╚═ nsw 🚩 ✅
╚═ ac
╠═ com 🚩
╠═ edu 🚩
╚═ gov 🚩
=== Symbol meanings ===
🚩 : path to this node is a valid eTLD
✅ : path to this node found in example URL host `example.nsw.edu.au`
```
The URL host subcomponents are parsed from right-to-left until no more matching nodes can be found. In this example, the path of matching nodes are `au -> edu -> nsw`. Reversing the nodes gives the extracted eTLD `nsw.edu.au`.
## Acknowledgements
- [go-fasttld (Go)](https://github.com/elliotwutingfeng/go-fasttld)
- [tldextract (Python)](https://github.com/john-kurkowski/tldextract)