Skip to main content

url_cleaner_engine/tutorial/cleaner/
commons.rs

1//! # [`Commons`]
2//!
3//! A "common" is basically a function, with `common_args` containing the arguments to that function.
4//!
5//! In the following cleaner, the `utm_*` family of query parameters are removed both before expanding bit.ly redirects and before outputting the result.
6//!
7//! ```Json
8//! {
9//!   "commons": {
10//!     "actions": {
11//!       "universal": {"RemoveQueryParams": ["utm_campaign", "utm_content", "utm_id", "utm_medium", "utm_source", "utm_term"]}
12//!     }
13//!   },
14//!   "actions": [
15//!     {"If": {
16//!       "if": {"NormalizedHostIs": "bit.ly"},
17//!       "then": {"All": [
18//!         {"Common": "universal"},
19//!         "ExpandRedirect"
20//!       ]}
21//!     }},
22//!     {"Common": "universal"}
23//!   ]
24//! }
25//! ```
26//!
27//! This prevents bit.ly from seeing the tracking parameters while keeping all places that require that in sync.
28//!
29//! ## Common args
30//!
31//! The not stupid way for a website to do redirects is to return an HTTP 301 status code with a header saying "go to `https://example.com/whatever`".
32//!
33//! Some websites instead do stupid things like using the [`meta`](https://developer.mozilla.org/en-US/docs/Web/HTML/Reference/Elements/meta/http-equiv) HTML element or javascript to do redirects.
34//!
35//! The default cleaner handles this stupidity by having a common action called `extract_from_page` that
36//!
37//! 1. Takes a string modification.
38//!
39//! 2. Gets the body of the webpage.
40//!
41//! 3. Applies the provided string modification to the body.
42//!
43//! 4. Replaces the URL with the result.
44//!
45//! For example, to handle `smarturl.it` redirects, the default cleaner uses the `extract_from_page` common action to search for `"originalUrl":` then extracts the value of the javascript string literal immediately after it.
46//!
47//! ```Json
48//! {
49//!   "commons": {
50//!     "actions": {
51//!       "extract_from_page": {"SetWhole": {"Modified": {
52//!         "value": {"HttpRequest": {}},
53//!         "modification": {"CommonCallArg": "extractor"}
54//!       }}}
55//!     }
56//!   },
57//!   "actions": [
58//!     {"If": {
59//!       "if": {"NormalizedHostIs": "smarturl.it"},
60//!       "then": {"Common": {
61//!         "name": "extract_from_page",
62//!         "args": {
63//!           "string_modifications": {
64//!             "extractor": {"All": [
65//!               {"KeepAfter": "\"originalUrl\":"},
66//!               "GetJsStringLiteralPrefix"
67//!             ]}
68//!           }
69//!         }
70//!       }}
71//!     }}
72//!   ]
73//! }
74//! ```
75//!
76//! While here the benefit of using a common is small, the actual code in the default cleaner includes caching, applies the `universal` common action, and accounts for the `no_network` flag, making it much more beneficial.
77//!
78//! A common can take flags, vars, conditions, actions, string sources, string modifications, and string matchers. These go in the `common_args` section seen in the [debugging](#Debugging) section.
79//!
80//! Additionally, conditions, actions, string sources, string modifications, and string matchers all have commons that can be invoked in the same way.
81
82pub(crate) use super::*;