1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
"""
An enumeration representing various types of text processing operations.
Attributes:
MatchNone (IntFlag): An operation that performs no matching (binary 00000001).
MatchFanjian (IntFlag): An operation that matches traditional and simplified Chinese characters (binary 00000010).
MatchDelete (IntFlag): An operation that matches deleted characters (binary 00000100).
MatchNormalize (IntFlag): An operation that normalizes characters (binary 00001000).
MatchDeleteNormalize (IntFlag): A combined operation that deletes and normalizes characters (binary 00001100).
MatchFanjianDeleteNormalize (IntFlag): A combined operation that matches traditional and simplified Chinese characters,
deletes, and normalizes (binary 00001110).
MatchPinYin (IntFlag): An operation that matches Pinyin representations of Chinese characters (binary 00010000).
MatchPinYinChar (IntFlag): An operation that matches individual characters in the Pinyin representation (binary 00100000).
"""
= 0b00000001
= 0b00000010
= 0b00000100
= 0b00001000
= 0b00001100
= 0b00001110
= 0b00010000
= 0b00100000
"""
An enumeration representing various types of regex matching operations.
Attributes:
MatchSimilarChar (str): An operation that matches characters that are similar in some way.
MatchAcrostic (str): An operation that matches acrostic patterns.
MatchRegex (str): An operation that matches using standard regular expressions.
"""
=
=
=
"""
An enumeration representing various types of similarity matching operations.
Attributes:
MatchLevenshtein (str): An operation that matches using the Levenshtein distance metric.
"""
=
"""
A TypedDict representing a simple text processing operation.
Attributes:
process_type (ProcessType): The type of processing operation to be performed.
"""
:
"""
A TypedDict representing a regex-based text processing operation.
Attributes:
process_type (ProcessType): The type of processing operation to be performed.
regex_match_type (RegexMatchType): The type of regex matching operation to be used.
"""
:
:
"""
A TypedDict representing a similarity-based text processing operation.
Attributes:
process_type (ProcessType): The type of processing operation to be performed.
sim_match_type (SimMatchType): The type of similarity matching operation to be used.
threshold (float): The threshold value for the similarity matching operation.
"""
:
:
:
"""
Create a dictionary representing a simple text processing operation.
Args:
process_type (ProcessType): The type of processing operation to be performed.
Returns:
Dict[str, Simple]: A dictionary with one key "simple" mapping to a Simple TypedDict
containing the provided process_type.
"""
return
"""
Create a dictionary representing a regex-based text processing operation.
Args:
process_type (ProcessType): The type of processing operation to be performed.
regex_match_type (RegexMatchType): The type of regex matching operation to be used.
Returns:
Dict[str, Regex]: A dictionary with one key "regex" mapping to a Regex TypedDict
containing the provided process_type and regex_match_type.
"""
return
"""
Create a dictionary representing a similarity-based text processing operation.
Args:
process_type (ProcessType): The type of processing operation to be performed.
sim_match_type (SimMatchType): The type of similarity matching operation to be used.
threshold (float): The threshold value for the similarity matching operation.
Returns:
Dict[str, Similar]: A dictionary with one key "similar" mapping to a Similar TypedDict
containing the provided process_type, sim_match_type, and threshold.
"""
return
"""
A TypedDict representing a table for matching operations.
Attributes:
table_id (int): A unique identifier for the match table.
match_table_type (Union[Dict[str, Simple], Dict[str, Regex], Dict[str, Similar]]):
A dictionary that specifies the type of match operation to be performed. The key is a string indicating
the match type ('simple', 'regex', 'similar'), and the value is a corresponding TypedDict describing
the operation.
word_list (List[str]): A list of words that are subject to the matching operations.
exemption_process_type (ProcessType): The type of process for which certain words are exempt from matching.
exemption_word_list (List[str]): A list of words that are exempt from the matching process.
"""
:
:
:
:
:
=
"""
A type alias for mapping table identifiers to lists of MatchTable objects.
Type:
Dict[int, List[MatchTable]]
This dictionary maps an integer table ID to a list of MatchTable objects that correspond to the ID. It is used to
organize and retrieve match tables based on their unique identifiers.
"""
"""
A TypedDict representing the result of a matching operation.
Attributes:
match_id (int): A unique identifier for the match result.
table_id (int): The identifier of the match table where the matching operation was performed.
word_id (int): The identifier of the matched word within the word list.
word (str): The matched word.
similarity (float): The similarity score of the match operation.
"""
:
:
:
:
:
=
"""
A type alias for representing a simple table structure for text processing.
This dictionary maps a `ProcessType` to another dictionary that maps an integer ID to a string.
The outer dictionary's keys represent different types of processing operations, while the inner
dictionary's keys represent unique identifiers corresponding to specific strings related to the
operations.
Type:
Dict[ProcessType, Dict[int, str]]
"""
"""
A TypedDict representing a simplified result of a text processing operation.
Attributes:
word_id (int): The identifier of the word within the word list.
word (str): The word corresponding to the word_id.
"""
:
: