1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
"""
IntFlag representing different simple match types.
Attributes:
MatchNone (int): A match type indicating no specific match criteria (0b00000001).
MatchFanjian (int): A match type for matching between traditional and simplified Chinese characters (0b00000010).
MatchWordDelete (int): A match type where words are deleted for matching purposes (0b00000100).
MatchTextDelete (int): A match type where text is deleted for matching purposes (0b00001000).
MatchDelete (int): A combined match type where both word and text deletions are applied (0b00001100).
MatchNormalize (int): A match type where text normalization is applied (0b00010000).
MatchDeleteNormalize (int): A combined match type where deletion and normalization are both applied (0b00011100).
MatchFanjianDeleteNormalize (int): A combined match type that includes Fanjian matching, deletion, and normalization (0b00011110).
MatchPinYin (int): A match type using Pinyin for matching Chinese characters (0b00100000).
MatchPinYinChar (int): A match type using individual Pinyin characters for a finer granularity match (0b01000000).
"""
= 0b00000001
= 0b00000010
= 0b00000100
= 0b00001000
= 0b00001100
= 0b00010000
= 0b00011100
= 0b00011110
= 0b00100000
= 0b01000000
"""
Enum representing different regex match types.
Attributes:
MatchSimilarChar (str): A match type that finds characters similar to a given set ("similar_char").
MatchAcrostic (str): A match type that looks for acrostic patterns ("acrostic").
MatchRegex (str): A match type that uses regular expressions for matching ("regex").
"""
=
=
=
"""
Enum representing different similarity match types.
Attributes:
MatchLevenshtein (str): A match type using the Levenshtein distance algorithm for measuring the difference between two sequences ("levenshtein").
MatchDamrauLevenshtein (str): A match type using the Damerau-Levenshtein distance algorithm, an extension of Levenshtein with transpositions allowed ("damrau_levenshtein").
MatchIndel (str): A match type that uses insertion and deletion operations for matching purposes ("indel").
MatchJaro (str): A match type using the Jaro distance algorithm to compare the similarity between two strings ("jaro").
MatchJaroWinkler (str): A match type using the Jaro-Winkler distance algorithm, an extension of Jaro with added weight for matching starting characters ("jaro_winkler").
"""
=
=
=
=
=
"""
Represents a simple match configuration.
Attributes:
simple_match_type (SimpleMatchType): The type of simple match to be used, as defined in SimpleMatchType.
"""
:
"""
Represents a regular expression match configuration.
Attributes:
regex_match_type (RegexMatchType): The type of regular expression match to be used, as defined in RegexMatchType.
"""
:
"""
Represents a similarity match configuration.
Attributes:
sim_match_type (SimMatchType): The type of similarity match to be used, as defined in SimMatchType.
threshold (float): The threshold value for the similarity match. This value determines the minimum similarity score required for a match to be considered successful.
"""
:
:
"""
A class representing different types of match tables.
Attributes:
Simple (Simple): Represents a simple match configuration.
Regex (Regex): Represents a regular expression match configuration.
Similar (Similar): Represents a similarity match configuration.
"""
=
=
=
"""
Represents a match table configuration with various match types.
Attributes:
table_id (int): The unique identifier for the match table.
match_table_type (MatchTableType): The type of match table, can be one of Simple, Regex, or Similar as defined in MatchTableType.
word_list (List[str]): A list of words to be used for matching.
exemption_simple_match_type (SimpleMatchType): Specifies which simple match type(s) to exempt from the match operation.
exemption_word_list (List[str]): A list of words to exempt from the match operation.
"""
:
:
:
:
:
=
:
:
=
:
:
=