1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
// SPDX-License-Identifier: Apache-2.0
//! Text analysis pipeline: tokenizers and analyzers.
//!
//! The [`Analyzer`] trait is the main entry point — a stateful, pull-based
//! token iterator. [`StandardAnalyzer`] provides Unicode-aware tokenization
//! with lowercase normalization, matching Lucene's `StandardAnalyzer`.
use Debug;
use ;
use crateTermOffset;
pub
pub use ;
pub use ;
/// A single token produced by the analyzer during tokenization.
/// Breaks text into a stream of tokens for indexing.
///
/// The analyzer owns its input reader. Call [`set_reader`](Analyzer::set_reader)
/// to provide input for a new field, then call [`next_token`](Analyzer::next_token)
/// repeatedly until it returns `None`. Each `set_reader` call replaces the
/// previous reader and resets internal state.
///
/// The returned [`Token`] borrows from the analyzer's internal buffer.
/// The caller must let the token drop before calling `next_token` again
/// (which the natural loop does).
/// Creates [`Analyzer`] instances for indexing workers.
///
/// Each worker thread receives its own `Analyzer` via [`create`](AnalyzerFactory::create).
/// The factory is shared across threads via `Arc` in [`IndexWriterConfig`](crate::index::config::IndexWriterConfig).
///
/// # Example
///
/// ```
/// use bearing::analysis::{Analyzer, AnalyzerFactory, StandardAnalyzerFactory};
///
/// let factory = StandardAnalyzerFactory;
/// let mut analyzer = factory.create();
/// ```