1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
//! Blank page detection utilities.
//!
//! Provides functions to determine if a page is blank based on its text content.
//! A page is considered blank if it contains no meaningful text after normalization.
/// Minimum number of non-whitespace characters to consider a page non-blank.
///
/// Pages with fewer than this many non-whitespace characters are considered blank.
/// This threshold accounts for stray characters, page numbers, or artifacts that
/// may appear on otherwise empty pages.
const MIN_NON_WHITESPACE_CHARS: usize = 3;
/// Determine if a page's text content indicates a blank page.
///
/// A page is blank if it has fewer than [`MIN_NON_WHITESPACE_CHARS`] non-whitespace characters.
///
/// # Arguments
///
/// * `text` - The extracted text content of the page
///
/// # Returns
///
/// `true` if the page is considered blank, `false` otherwise