1
  2
  3
  4
  5
  6
  7
  8
  9
 10
 11
 12
 13
 14
 15
 16
 17
 18
 19
 20
 21
 22
 23
 24
 25
 26
 27
 28
 29
 30
 31
 32
 33
 34
 35
 36
 37
 38
 39
 40
 41
 42
 43
 44
 45
 46
 47
 48
 49
 50
 51
 52
 53
 54
 55
 56
 57
 58
 59
 60
 61
 62
 63
 64
 65
 66
 67
 68
 69
 70
 71
 72
 73
 74
 75
 76
 77
 78
 79
 80
 81
 82
 83
 84
 85
 86
 87
 88
 89
 90
 91
 92
 93
 94
 95
 96
 97
 98
 99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
/*!
# HTML Minifier

This library can help you generate and minify your HTML code at the same time. It also supports to minify JS and CSS in `<style>`, `<script>` elements, and ignores the minification of `<pre>`, `<code>` and `<textarea>` elements.

HTML is minified by the following rules:

* ASCII control characters (0x00-0x08, 0x11-0x1F, 0x7F) are always removed.
* Comments can be optionally removed. (removed by default)
* **Useless** whitespaces (spaces, tabs and newlines) are removed.
* Whitespaces (spaces, tabs and newlines) are converted to a single `'\x20'` or a single '\n', if possible.
* Empty attribute values are collapsed. (e.g `<input readonly="">` => `<input readonly>` )
* The inner HTML of all elements is minified except for the following elements:
    * `<pre>`
    * `<textarea>`
    * `<code>` (optionally, minified by default)
    * `<style>` (if the `type` attribute is unsupported)
    * `<script>` (if the `type` attribute is unsupported)
* JS code and CSS code in `<script>` and `<style>` elements are minified by [minifier](https://crates.io/crates/minifier).

The original (non-minified) HTML doesn't need to be completely generated before using this library because this library doesn't do any deserialization to create DOMs.

In earier versions, this libaray tried to make HTML inline (e.g. `<a>1</a>\n /\n <a>2</a>` => `<a>1</a> / <a>2</a>`). With this feature, CJ characters need to be checked, otherwise `中\n文` will be minified to `中 文`, which is incorrect.

After version `3.0.0`, this libaray doesn't try to make HTML inline anymore in favor of better performance by removing UTF-8 calculation. Moreover, with this change, it allows the input texts to be encoded not only in ASCII or UTF-8 but also in any other self-synchronizing encoding.

## Examples

```rust
extern crate html_minifier;

use html_minifier::HTMLMinifier;

let mut html_minifier = HTMLMinifier::new();

html_minifier.digest("<!DOCTYPE html>   <html  ").unwrap();
html_minifier.digest("lang=  en >").unwrap();
html_minifier.digest("
<head>
    <meta name=viewport>
</head>
").unwrap();
html_minifier.digest("
<body class=' container   bg-light '>
    <input type='text' value='123   456' readonly=''  />

    123456
    <b>big</b> 789
    ab
    c
    中文
    字
</body>
").unwrap();
html_minifier.digest("</html  >").unwrap();

assert_eq!("<!DOCTYPE html> <html lang=en>
<head>
<meta name=viewport>
</head>
<body class='container bg-light'>
<input type='text' value='123   456' readonly/>
123456
<b>big</b> 789
ab
c
中文
字
</body>
</html>".as_bytes(), html_minifier.get_html());
```

```rust
extern crate html_minifier;

use html_minifier::HTMLMinifier;

let mut html_minifier = HTMLMinifier::new();

html_minifier.digest("<pre  >   Hello  world!   </pre  >").unwrap();

assert_eq!(b"<pre>   Hello  world!   </pre>", html_minifier.get_html());
```

```rust
extern crate html_minifier;

use html_minifier::HTMLMinifier;

let mut html_minifier = HTMLMinifier::new();

html_minifier.digest("<script type='  application/javascript '>   alert('Hello!')    ;   </script>").unwrap();

assert_eq!("<script type='application/javascript'>alert('Hello!')</script>".as_bytes(), html_minifier.get_html());
```

## Write HTML to a Writer

If you don't want to store your HTML in memory (e.g. writing to a file instead), you can use the `HTMLMinifierHelper` struct which provides a low-level API that allows you to pass your output instance when invoking the `digest` method.

```rust
extern crate html_minifier;

use html_minifier::HTMLMinifierHelper;

# #[cfg(feature = "std")] {
use std::fs::File;
use std::io::Read;

let mut input_file = File::open("tests/data/w3schools.com_tryhow_css_example_website.htm").unwrap();
let mut output_file = File::create("tests/data/index.min.html").unwrap();

let mut buffer = [0u8; 256];

let mut html_minifier_helper = HTMLMinifierHelper::new();

loop {
    let c = input_file.read(&mut buffer).unwrap();

    if c == 0 {
        break;
    }

    html_minifier_helper.digest(&buffer[..c], &mut output_file).unwrap();
}
# }
```

## No Std

Disable the default features to compile this crate without std.

```toml
[dependencies.html-minifier]
version = "*"
default-features = false
```
*/

#![cfg_attr(not(feature = "std"), no_std)]

extern crate alloc;

#[macro_use]
extern crate educe;

mod errors;
mod functions;
mod html_minifier_helper;
mod html_writer;

use alloc::string::String;
use alloc::vec::Vec;

use crate::functions::*;

pub use errors::*;
pub use html_minifier_helper::*;
pub use html_writer::*;

/// This struct helps you generate and minify your HTML code in the same time. The output destination is inside this struct.
#[derive(Educe, Clone)]
#[educe(Debug, Default(new))]
pub struct HTMLMinifier {
    helper: HTMLMinifierHelper,
    #[educe(Debug(method = "str_bytes_fmt"))]
    out: Vec<u8>,
}

impl HTMLMinifier {
    /// Set whether to remove HTML comments.
    #[inline]
    pub fn set_remove_comments(&mut self, remove_comments: bool) {
        self.helper.remove_comments = remove_comments;
    }

    /// Set whether to minify the content in the `code` element.
    #[inline]
    pub fn set_minify_code(&mut self, minify_code: bool) {
        self.helper.minify_code = minify_code;
    }

    /// Get whether to remove HTML comments.
    #[inline]
    pub fn get_remove_comments(&self) -> bool {
        self.helper.remove_comments
    }

    /// Get whether to minify the content in the `code` element.
    #[inline]
    pub fn get_minify_code(&self) -> bool {
        self.helper.minify_code
    }
}

impl HTMLMinifier {
    /// Reset this html minifier. The option settings and allocated memory will be be preserved.
    #[inline]
    pub fn reset(&mut self) {
        self.helper.reset();
        self.out.clear();
    }
}

impl HTMLMinifier {
    /// Input some text to generate HTML code. It is not necessary to input a full HTML text at once.
    #[inline]
    pub fn digest<S: AsRef<[u8]>>(&mut self, text: S) -> Result<(), HTMLMinifierError> {
        let text = text.as_ref();

        self.out.reserve(text.len());

        self.helper.digest(text, &mut self.out)
    }

    /// Directly input some text to generate HTML code. The text will just be appended to the output buffer instead of being through the helper.
    ///
    /// # When to Use This?
    ///
    /// If the text has been minified, you can consider to use this method to get a better performance.
    #[allow(clippy::missing_safety_doc)]
    #[inline]
    pub unsafe fn indigest<S: AsRef<[u8]>>(&mut self, text: S) {
        self.out.extend_from_slice(text.as_ref());
    }
}

impl HTMLMinifier {
    /// Get HTML in a string slice.
    #[inline]
    pub fn get_html(&mut self) -> &[u8] {
        self.out.as_slice()
    }
}

/// Minify HTML.
#[inline]
pub fn minify<S: AsRef<str>>(html: S) -> Result<String, HTMLMinifierError> {
    let mut minifier = HTMLMinifierHelper::new();

    let html = html.as_ref();

    let mut minified_html = String::with_capacity(html.len());

    minifier.digest(html, unsafe { minified_html.as_mut_vec() })?;

    Ok(minified_html)
}