adblock 0.3.17

Native Rust module for Adblock Plus syntax (e.g. EasyList, EasyPrivacy) filter parsing and matching.
Documentation

Ad Block engine in Rust

docs.rs crates.io npmjs.com Build Status

Native Rust module for Adblock Plus syntax (e.g. EasyList, EasyPrivacy) filter parsing and matching.

It uses a tokenisation approach for quickly reducing the potentially matching rule search space against a URL.

The algorithm is inspired by, and closely follows the algorithm of uBlock Origin and Cliqz.

Somewhat graphical explanation of the algorithm:

Ad Block Algorithm

Demo

Demo use in Rust:

use adblock::engine::Engine;
use adblock::lists::{FilterFormat, FilterSet};

fn main() {
    let rules = vec![
        String::from("-advertisement-icon."),
        String::from("-advertisement-management/"),
        String::from("-advertisement."),
        String::from("-advertisement/script."),
    ];

    let mut filter_set = FilterSet::new(true);
    filter_set.add_filters(&rules, FilterFormat::Standard);

    let blocker = Engine::from_filter_set(&filter_set, true);
    let blocker_result = blocker.check_network_urls("http://example.com/-advertisement-icon.", "http://example.com/helloworld", "image");

    println!("Blocker result: {:?}", blocker_result);
}

Node.js module demo

Note the Node.js module has overheads inherent to boundary crossing between JS and native code.

const AdBlockClient = require('adblock-rs');
let el_rules = fs.readFileSync('./data/easylist.to/easylist/easylist.txt', { encoding: 'utf-8' }).split('\n');
let ubo_unbreak_rules = fs.readFileSync('./data/uBlockOrigin/unbreak.txt', { encoding: 'utf-8' }).split('\n');
let rules = el_rules.concat(ubo_unbreak_rules);
let resources = AdBlockClient.uBlockResources('uBlockOrigin/src/web_accessible_resources', 'uBlockOrigin/src/js/redirect-engine.js', 'uBlockOrigin/assets/resources/scriptlets.js');

const filterSet = new AdBlockClient.FilterSet(true);
filterSet.addFilters(rules);
const client = new AdBlockClient.Engine(filterSet, true);
client.useResources(resources);

const serializedArrayBuffer = client.serializeRaw(); // Serialize the engine to an ArrayBuffer

console.log(`Engine size: ${(serializedArrayBuffer.byteLength / 1024 / 1024).toFixed(2)} MB`);

console.log("Matching:", client.check("http://example.com/-advertisement-icon.", "http://example.com/helloworld", "image"))
// Match with full debuging info
console.log("Matching:", client.check("http://example.com/-advertisement-icon.", "http://example.com/helloworld", "image", true))
// No, but still with debugging info
console.log("Matching:", client.check("https://github.githubassets.com/assets/frameworks-64831a3d.js", "https://github.com/AndriusA", "script", true))
// Example that inlcludes a redirect response
console.log("Matching:", client.check("https://bbci.co.uk/test/analytics.js", "https://bbc.co.uk", "script", true))

Optional features

CSS validation during rule parsing

When parsing cosmetic filter rules, it's possible to include a built-in implementation of CSS validation (through the selectors and cssparser crates) by enabling the css-validation feature. This will cause adblock-rust to reject cosmetic filter rules with invalid CSS syntax.

Content blocking format translation

Enabling the content-blocking feature gives adblock-rust support for conversion of standard ABP-style rules into Apple's content-blocking format, which can be exported for use on iOS and macOS platforms.

External domain resolution

By default, adblock-rust ships with a built-in domain resolution implementation (through the addr crate) that will generally suffice for standalone use-cases. For more advanced use-cases, disabling the embedded-domain-resolver feature will allow adblock-rust to use an external domain resolution implementation instead. This is extremely useful to reduce binary bloat and improve consistency when embedding adblock-rust within a browser.

Parsing resources from uBlock Origin's formats

adblock-rust uses uBlock Origin-compatible resources for scriptlet injection and redirect rules. The resource-assembler feature allows adblock-rust to parse these resources directly from the file formats used by the uBlock Origin repository.