1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
//! Char filter implementations for text normalization.
//!
//! This module provides filters that pre-process the text string before it is
//! passed to the tokenizer. This allows for normalization operations like
//! Unicode normalization or regex replacement.
//!
//! # Available Filters
//!
//! - [`unicode_normalize::UnicodeNormalizationCharFilter`] - Unicode normalization (NFC, NFD, etc.)
//! - [`pattern_replace::PatternReplaceCharFilter`] - Regex-based replacement
//! - [`japanese_iteration_mark::JapaneseIterationMarkCharFilter`] - Japanese iteration mark normalization
//! - [`mapping::MappingCharFilter`] - Character mapping replacement
//!
//! # Examples
//!
//! ```
//! use laurus::analysis::char_filter::CharFilter;
//! use laurus::analysis::char_filter::unicode_normalize::{UnicodeNormalizationCharFilter, NormalizationForm};
//!
//! let filter = UnicodeNormalizationCharFilter::new(NormalizationForm::NFKC);
//! let (normalized, _transformations) = filter.filter("fine");
//! assert_eq!(normalized, "fine");
//! ```
/// Represents a character offset mapping between original and filtered text.
///
/// When a [`CharFilter`] modifies text (e.g., replacing characters, expanding ligatures,
/// or removing diacritics), the character positions in the filtered output no longer
/// correspond 1:1 to positions in the original input. A `Transformation` records one
/// such positional shift so that downstream components (tokenizer, highlighter, etc.)
/// can map offsets back to the original text.
///
/// Each transformation describes a contiguous region in the original text and the
/// corresponding region in the new (filtered) text that replaced it.
/// Trait for character filters that transform text before tokenization.
///
/// Implementations can modify the text content and returns the modified text
/// along with a list of transformations that occurred.