1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
// Copyright (C) 2019 - 2024 Wilfred Bos
// Licensed under the MIT license. See the LICENSE file for the terms and conditions.
//! # Backward Nondeterministic Dawg Matching (BNDM)
//!
//! BNDM is an optimized string search algorithm designed for efficiently locating
//! patterns within a text. The BNDM algorithm was invented by Gonzalo Navarro and Mathieu
//! Raffinot.
//!
//! The BNDM algorithm works by preprocessing the pattern to generate a set of bitmasks.
//! These bitmasks are then used to efficiently scan the text for occurrences of the
//! pattern.
//!
//! Unlike the traditional BNDM algorithm, this implementation is optimized by scanning the
//! first 2 bytes outside the inner loop. Additionally, it does not have the limitation that
//! the pattern size is limited to the word size of the CPU. It's possible to search for
//! larger patterns than the word size and still benefit from the performance that BNDM offers.
//!
//! One of the key features of this implementation is its support for wildcard search.
//! This algorithm is ideally suited for this, as it does not impact the performance of the
//! pattern matching itself. The wildcard search only slightly affects the indexing, ensuring
//! that the overall efficiency of the pattern matching remains high.
//!
//! ## BndmConfig
//!
//! The `BndmConfig` struct is used to store the preprocessed pattern and the bitmasks.
//!
//! ## Functions
//!
//! The main function provided by this module is `find_pattern()`, which searches for the
//! pattern in a given text. It returns the index of the first occurrence of the pattern
//! in the text, or `None` if the pattern is not found.
//!
//! ## Usage
//!
//! Here is an example of how to use this module to search for a pattern in a text:
//!
//! ```rust
//! use bndm::{BndmConfig, find_pattern};
//!
//! let source = b"The quick brown fox jumps over the lazy dog";
//! let pattern = b"jumps";
//! let config = BndmConfig::new(pattern, None);
//! let index = find_pattern(source, &config);
//! assert_eq!(index, Some(20));
//! ```
//!
//! With wildcard:
//!
//! ```rust
//! use bndm::{BndmConfig, find_pattern};
//!
//! let source = b"The quick brown fox jumps over the lazy dog";
//! let pattern = b"ju??s";
//! let config = BndmConfig::new(pattern, Some(b'?'));
//! let index = find_pattern(source, &config);
//! assert_eq!(index, Some(20));
//! ```
use min;
const MASKS_TABLE_SIZE: usize = 256;
const WORD_SIZE_IN_BITS: usize = usizeBITS as usize;
/// The `BndmConfig` struct is used to store the pattern and the bitmasks.
/// Searches for the pattern in the source string using the BNDM algorithm.
///
/// The function takes a source string and a `BndmConfig` as input. The `BndmConfig`
/// contains the preprocessed pattern and the bitmasks. The function returns the index
/// of the first occurrence of the pattern in the text, or `None` if the pattern is not
/// found.
///
/// # Arguments
///
/// * `source` - The source string to search for the pattern.
/// * `config` - The configuration for the BNDM search, which includes the pattern and the
/// bitmasks.
///
/// # Returns
///
/// * `Option<usize>` - Returns the index of the first occurrence of the pattern in the text,
/// or `None` if the pattern is not found.
///
/// # Usage
///
/// Without wildcard:
///
/// ```rust
/// use bndm::{BndmConfig, find_pattern};
///
/// let source = b"The quick brown fox jumps over the lazy dog";
/// let pattern = b"jumps";
/// let config = BndmConfig::new(pattern, None);
/// let index = find_pattern(source, &config);
/// assert_eq!(index, Some(20));
/// ```
///
/// With wildcard:
///
/// ```rust
/// use bndm::{BndmConfig, find_pattern};
///
/// let source = b"The quick brown fox jumps over the lazy dog";
/// let pattern = b"ju??s";
/// let config = BndmConfig::new(pattern, Some(b'?'));
/// let index = find_pattern(source, &config);
/// assert_eq!(index, Some(20));
/// ```
/// Checks if the remaining part of the pattern matches the source string.
///
/// This function is used when the pattern is longer than the CPU word size.
/// It checks the remaining part of the pattern (after the first CPU word size characters)
/// against the corresponding part of the source string.
///
/// # Arguments
///
/// * `source` - The source string to search for the pattern.
/// * `config` - The configuration for the BNDM search, which includes the pattern and the
/// wildcard character.
/// * `start_index` - The index in the source string from where the remaining part of the
/// pattern should be checked.
///
/// # Returns
///
/// * `bool` - Returns `true` if the remaining part of the pattern matches the corresponding part of the source string, `false` otherwise.