spider_firewall 2.35.0

Firewall to use for Spider Web Crawler.
docs.rs failed to build spider_firewall-2.35.0
Please check the build logs for more information.
See Builds for ideas on how to fix a failed build, or Metadata for how to configure docs.rs builds.
If you believe this is docs.rs' fault, open an issue.

spider_firewall

A Rust library to shield your system from malicious and unwanted websites by categorizing and blocking them.

Installation

Add spider_firewall to your Cargo project with:

cargo add spider_firewall

Size Tiers

The small tier is enabled by default. Enable medium or large for broader coverage — each tier includes all sources from the tier(s) below it.

Tier FST Size Focus Feature Flag
small (default) ~13 MB Ads, tracking, malware, phishing, scams small
medium ~26 MB + ransomware, fraud, abuse, threat intel medium
large ~52 MB + redirect/typosquatting, extended ads/tracking, full URLhaus large
# Default — small tier, all categories:
spider_firewall = "2.35"

# Medium tier:
spider_firewall = { version = "2.35", features = ["medium"] }

# Large tier:
spider_firewall = { version = "2.35", features = ["large"] }

# Small tier, only bad + ads (no tracking/gambling):
spider_firewall = { version = "2.35", default-features = false, features = ["default-tls", "bad", "ads", "small"] }

Category Features

Categories can be toggled independently (all enabled by default):

Feature Description
bad Malware, phishing, scams, fraud, ransomware, abuse
ads Advertising domains
tracking Tracking and analytics domains
gambling Gambling domains

Usage

Checking for Bad Websites

You can check if a website is part of the bad websites list using the is_bad_website_url function.

use spider_firewall::is_bad_website_url;

fn main() {
    let u = url::Url::parse("https://badwebsite.com").expect("parse");
    let blocked = is_bad_website_url(u.host_str().unwrap_or_default());
    println!("Is blocked: {}", blocked);
}

Adding a Custom Firewall

You can add your own websites to the block list using the define_firewall! macro. This allows you to categorize new websites under a predefined or new category.

use spider_firewall::is_bad_website_url;

// Add "bad.com" to a custom category.
define_firewall!("unknown", "bad.com");

fn main() {
    let u = url::Url::parse("https://bad.com").expect("parse");
    let blocked = is_bad_website_url(u.host_str().unwrap_or_default());
    println!("Is blocked: {}", blocked);
}

Example with Custom Ads List

You can specify websites to be blocked under specific categories such as "ads".

use spider_firewall::is_ad_website_url;

// Add "ads.com" to the ads category.
define_firewall!("ads", "ads.com");

fn main() {
    let u = url::Url::parse("https://ads.com").expect("parse");
    let blocked = is_ad_website_url(u.host_str().unwrap_or_default());
    println!("Is blocked: {}", blocked);
}

Blocklist Sources

Small (default)

Source Categories License
ShadowWhisperer/BlockLists bad, ads, tracking, gambling MIT
badmojr/1Hosts Lite ads, tracking MPL-2.0
spider-rs/bad_websites bad MIT
Steven Black Unified Hosts bad MIT
Block List Project — Malware bad MIT
Block List Project — Phishing bad MIT
Block List Project — Scam bad MIT
URLhaus Filter (domains) bad CC0/MIT

Medium (adds)

Source Categories License
Block List Project — Ransomware bad MIT
Block List Project — Fraud bad MIT
Block List Project — Abuse bad MIT
Phishing.Database — Active Domains bad MIT
Stamparm/maltrail — Suspicious bad MIT

Large (adds)

Source Categories License
Block List Project — Redirect bad MIT
Block List Project — Tracking tracking MIT
Block List Project — Ads ads MIT
Stamparm/maltrail — Malware bad MIT
abuse.ch URLhaus Hostfile bad CC0

Build Time

The initial build can take longer, approximately 5-10 minutes, as it may involve compiling dependencies and generating necessary data files.

Contributing

Contributions and improvements are welcome. Feel free to open issues or submit pull requests on the GitHub repository.

License

This project is licensed under the MIT License.