url_cleaner_engine/tutorial/cleaner/commons.rs
1//! # [`Commons`]
2//!
3//! A "common" is basically a function, with `common_args` containing the arguments to that function.
4//!
5//! In the following cleaner, the `utm_*` family of query parameters are removed both before expanding bit.ly redirects and before outputting the result.
6//!
7//! ```Json
8//! {
9//! "commons": {
10//! "actions": {
11//! "universal": {"RemoveQueryParams": ["utm_campaign", "utm_content", "utm_id", "utm_medium", "utm_source", "utm_term"]}
12//! }
13//! },
14//! "actions": [
15//! {"If": {
16//! "if": {"NormalizedHostIs": "bit.ly"},
17//! "then": {"All": [
18//! {"Common": "universal"},
19//! "ExpandRedirect"
20//! ]}
21//! }},
22//! {"Common": "universal"}
23//! ]
24//! }
25//! ```
26//!
27//! This prevents bit.ly from seeing the tracking parameters while keeping all places that require that in sync.
28//!
29//! ## Common args
30//!
31//! The not stupid way for a website to do redirects is to return an HTTP 301 status code with a header saying "go to `https://example.com/whatever`".
32//!
33//! Some websites instead do stupid things like using the [`meta`](https://developer.mozilla.org/en-US/docs/Web/HTML/Reference/Elements/meta/http-equiv) HTML element or javascript to do redirects.
34//!
35//! The default cleaner handles this stupidity by having a common action called `extract_from_page` that
36//!
37//! 1. Takes a string modification.
38//!
39//! 2. Gets the body of the webpage.
40//!
41//! 3. Applies the provided string modification to the body.
42//!
43//! 4. Replaces the URL with the result.
44//!
45//! For example, to handle `smarturl.it` redirects, the default cleaner uses the `extract_from_page` common action to search for `"originalUrl":` then extracts the value of the javascript string literal immediately after it.
46//!
47//! ```Json
48//! {
49//! "commons": {
50//! "actions": {
51//! "extract_from_page": {"SetWhole": {"Modified": {
52//! "value": {"HttpRequest": {}},
53//! "modification": {"CommonCallArg": "extractor"}
54//! }}}
55//! }
56//! },
57//! "actions": [
58//! {"If": {
59//! "if": {"NormalizedHostIs": "smarturl.it"},
60//! "then": {"Common": {
61//! "name": "extract_from_page",
62//! "args": {
63//! "string_modifications": {
64//! "extractor": {"All": [
65//! {"KeepAfter": "\"originalUrl\":"},
66//! "GetJsStringLiteralPrefix"
67//! ]}
68//! }
69//! }
70//! }}
71//! }}
72//! ]
73//! }
74//! ```
75//!
76//! While here the benefit of using a common is small, the actual code in the default cleaner includes caching, applies the `universal` common action, and accounts for the `no_network` flag, making it much more beneficial.
77//!
78//! A common can take flags, vars, conditions, actions, string sources, string modifications, and string matchers. These go in the `common_args` section seen in the [debugging](#Debugging) section.
79//!
80//! Additionally, conditions, actions, string sources, string modifications, and string matchers all have commons that can be invoked in the same way.
81
82pub(crate) use super::*;