1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
//! Thread-local storage and text variant types for the transformation pipeline.
//!
//! [`TextVariant`] and [`ProcessedTextMasks`] are the output types of
//! [`super::graph::walk_process_tree`]. The string pool ([`STRING_POOL`]) and combined
//! traversal state ([`TRANSFORM_STATE`]) reduce allocation churn by recycling buffers
//! across matcher calls within each thread.
//!
//! # Safety model
//!
//! Both thread-local statics use `UnsafeCell` with `#[thread_local]` (a nightly feature)
//! to avoid the closure overhead of the `thread_local!` macro. Safety relies on two
//! invariants:
//!
//! 1. `#[thread_local]` guarantees single-threaded access — no data races.
//! 2. No public function in this module is re-entrant: the borrow from `UnsafeCell::get()`
//! is always dropped before any call that could re-enter the same pool.
use Cow;
use UnsafeCell;
/// Maximum number of [`String`] buffers retained in the pool between calls; excess are dropped.
const STRING_POOL_MAX: usize = 128;
/// Maximum number of [`ProcessedTextMasks`] buffers retained in the pool between calls; excess are dropped.
const MASKS_POOL_MAX: usize = 16;
/// A single text variant produced by the transformation pipeline, paired with matching metadata.
///
/// [`walk_process_tree`](super::walk_process_tree) emits one `TextVariant` per unique
/// transformed string. The matcher scans each variant's `text` with the Aho-Corasick
/// automaton and uses `mask` to credit hits to the correct rules.
///
/// # Examples
///
/// ```rust
/// use std::collections::HashSet;
/// use matcher_rs::{ProcessType, TextVariant, build_process_type_tree, walk_process_tree};
///
/// let tree = build_process_type_tree(&HashSet::from([ProcessType::None]));
/// let (variants, _) = walk_process_tree::<false, _>(&tree, "hello", &mut |_, _, _, _| false);
///
/// assert_eq!(variants.len(), 1);
/// assert_eq!(variants[0].text, "hello");
/// assert!(variants[0].is_ascii);
/// ```
/// All text variants produced for a single input by the transformation pipeline.
///
/// Returned by [`walk_process_tree`](super::walk_process_tree). The number of elements
/// depends on the active [`ProcessType`](crate::ProcessType) configuration and how many
/// intermediate results are deduplicated (different trie paths that produce the same
/// string share a single entry with a merged `mask`).
///
/// # Examples
///
/// ```rust
/// use std::collections::HashSet;
/// use matcher_rs::{ProcessType, ProcessedTextMasks, build_process_type_tree, walk_process_tree};
///
/// let types = HashSet::from([ProcessType::None, ProcessType::Fanjian]);
/// let tree = build_process_type_tree(&types);
/// let (masks, _): (ProcessedTextMasks<'_>, _) =
/// walk_process_tree::<false, _>(&tree, "妳好", &mut |_, _, _, _| false);
///
/// // At least two variants: original + Fanjian-converted.
/// assert!(masks.len() >= 2);
/// ```
pub type ProcessedTextMasks<'a> = ;
/// Combined thread-local state for tree-walk scratch data and the masks buffer pool.
///
/// Keeping both in a single `#[thread_local]` static avoids a second TLS lookup on every
/// [`walk_process_tree`](super::walk_process_tree) call.
pub
/// Pool of reusable [`String`] buffers, one per thread.
///
/// Avoids repeated allocation during text transformation. Bounded to [`STRING_POOL_MAX`]
/// entries between calls; excess strings are dropped.
///
/// # Safety
///
/// Uses `#[thread_local]` + `UnsafeCell` to eliminate the `thread_local!` macro's
/// `.with()` closure overhead. Single-threaded access is guaranteed by the
/// `#[thread_local]` attribute. No function in this module is re-entrant while the
/// mutable reference from `UnsafeCell::get()` is live.
pub static STRING_POOL: = new;
/// Combined per-thread traversal state for [`walk_process_tree`](super::walk_process_tree).
///
/// Merges the trie-node-to-text-index map and the [`ProcessedTextMasks`] buffer pool into
/// one TLS slot to save a lookup on every matcher call.
///
/// # Safety
///
/// Same invariants as [`STRING_POOL`]: `#[thread_local]` guarantees single-threaded access,
/// and no function re-enters this static while a mutable reference is live.
pub static TRANSFORM_STATE: =
new;
/// Pops a reusable [`String`] from the thread-local pool, or allocates a new one.
///
/// The requested `capacity` is treated as a lower bound; a recycled string is reserved
/// upward if needed so callers can append without repeated growth.
///
/// # Safety
///
/// Accesses [`STRING_POOL`] through `UnsafeCell::get()`. This is safe because:
/// - `#[thread_local]` guarantees no concurrent access from other threads.
/// - No caller holds a mutable reference to the pool when this function is entered
/// (the pool functions are not re-entrant).
pub
/// Returns a [`String`] to the thread-local pool for future reuse.
///
/// The pool is intentionally bounded to [`STRING_POOL_MAX`]: large bursts can allocate
/// temporarily, but only the hottest buffers are retained to keep per-thread memory
/// usage predictable.
///
/// # Safety
///
/// Same safety model as [`get_string_from_pool`] — single-threaded, non-re-entrant
/// access to [`STRING_POOL`].
pub
/// Drains a [`ProcessedTextMasks`] collection, returns all owned strings to the string
/// pool, and stashes the emptied `Vec` in the masks pool for reuse.
///
/// This is used internally by [`crate::SimpleMatcher`] to recycle traversal output
/// between calls. External users of [`crate::walk_process_tree`] can simply drop the
/// returned vector — no manual recycling is needed.
///
/// # Safety
///
/// Contains two `unsafe` blocks:
///
/// 1. **`transmute` of the empty `Vec`** — After `drain()`, the `Vec` holds zero
/// elements, so no `Cow<'_, str>` borrows exist. Transmuting `Vec<TextVariant<'_>>`
/// to `Vec<TextVariant<'static>>` is sound because an empty `Vec` stores no values
/// and `Cow<'_, str>` has identical layout regardless of lifetime.
///
/// 2. **`TRANSFORM_STATE.get()`** — Same TLS safety model as the string pool functions:
/// `#[thread_local]` guarantees single-threaded access, and no caller holds a mutable
/// reference when this function is entered.
pub