1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
//! Token filter implementations for token transformation.
//!
//! This module provides various filters that transform token streams produced
//! by tokenizers. Filters can modify, remove, or add tokens to implement
//! features like lowercasing, stemming, stop word removal, and synonym expansion.
//!
//! # Available Filters
//!
//! - [`lowercase::LowercaseFilter`] - Converts tokens to lowercase
//! - [`stop::StopFilter`] - Removes stop words
//! - [`stem::StemFilter`] - Reduces words to their stem form
//! - [`synonym_graph::SynonymGraphFilter`] - Expands synonyms
//! - [`limit::LimitFilter`] - Limits number of tokens
//! - [`boost::BoostFilter`] - Adjusts token scoring weights
//! - [`strip::StripFilter`] - Removes specific characters
//! - [`remove_empty::RemoveEmptyFilter`] - Removes empty tokens
//! - [`flatten_graph::FlattenGraphFilter`] - Flattens token graphs
//!
//! # Examples
//!
//! ```
//! use laurus::analysis::token_filter::Filter;
//! use laurus::analysis::token_filter::lowercase::LowercaseFilter;
//! use laurus::analysis::token::Token;
//!
//! let filter = LowercaseFilter::new();
//! let tokens = vec![Token::new("Hello", 0), Token::new("WORLD", 1)];
//! let filtered: Vec<_> = filter.filter(Box::new(tokens.into_iter()))
//! .unwrap()
//! .collect();
//!
//! assert_eq!(filtered[0].text, "hello");
//! assert_eq!(filtered[1].text, "world");
//! ```
//!
//! # Filter Chaining
//!
//! Filters can be chained together in an analyzer to create complex
//! text processing pipelines:
//!
//! ```text
//! Tokenizer → Lowercase → Stop Words → Stemmer → Index
//! ```
use crateTokenStream;
use crateResult;
/// Trait for filters that transform token streams.
///
/// All token filters must implement this trait to be used in the analysis
/// pipeline. Filters receive a stream of tokens and produce a new stream,
/// allowing them to modify, filter, or augment tokens.
///
/// The trait requires `Send + Sync` to allow use in concurrent contexts.
///
/// # Examples
///
/// Implementing a custom filter:
///
/// ```
/// use laurus::analysis::token::{Token, TokenStream};
/// use laurus::analysis::token_filter::Filter;
/// use laurus::Result;
///
/// struct ReverseFilter;
///
/// impl Filter for ReverseFilter {
/// fn filter(&self, tokens: TokenStream) -> Result<TokenStream> {
/// let reversed: Vec<Token> = tokens
/// .map(|mut t| {
/// t.text = t.text.chars().rev().collect();
/// t
/// })
/// .collect();
/// Ok(Box::new(reversed.into_iter()))
/// }
///
/// fn name(&self) -> &'static str {
/// "reverse"
/// }
/// }
/// ```
// Individual filter modules