# Lindera Filter
[](https://opensource.org/licenses/MIT) [](https://gitter.im/lindera-morphology/lindera?utm_source=badge&utm_medium=badge&utm_campaign=pr-badge&utm_content=badge) [](https://crates.io/crates/lindera-filter)
Character and token filters for [Lindera](https://github.com/lindera-morphology/lindera).
## Character filters
### Japanese iteration mark character filter
Normalizes Japanese horizontal [iteration marks](https://en.wikipedia.org/wiki/Iteration_mark) (odoriji) to their expanded form.
Sequences of iteration marks are supported. In case an illegal sequence of iteration marks is encountered, the implementation emits the illegal source character as-is without considering its script. For example, with input "?ゝ", we get "??" even though the question mark isn't hiragana.
### Mapping character filter
Replace characters with the specified character mappings, and correcting the resulting changes to the offsets.
Matching is greedy (longest pattern matching at a given point wins). Replacement is allowed to be the empty string.
### Regex character filter
Character filter that uses a regular expression for the target of replace string.
### Unicode normalize character filter
Unicode normalization to normalize the input text, that using the specified normalization form, one of NFC, NFD, NFKC, or NFKD.
## Token filters
### Japanese base form token filter
Replace the term text with the base form registered in the morphological dictionary.
This acts as a lemmatizer for verbs and adjectives.
### Japanese compound word token filter
Compound consecutive tokens that have specified part-of-speech tags into a single token.
This is useful for handling compound words that are not registered in the morphological dictionary.
### Japanese katakana stem token filter
Normalizes common katakana spelling variations ending with a long sound (U+30FC) by removing that character.
Only katakana words longer than the minimum length are stemmed.
### Japanese keep tags token filter
Keep only tokens with the specified part-of-speech tag.
### Japanese number token filter
Convert tokens representing Japanese numerals, including Kanji numerals, to Arabic numerals.
### Japanese reading form token filter
Replace the text of a token with the reading of the text as registered in the morphological dictionary.
The reading is in katakana.
### Japanese stop tags token filter
Remove tokens with the specified part-of-speech tag.
### Keep words token filter
Keep only the tokens of the specified text.
### Korean keep tags token filter
Keep only tokens with the specified part-of-speech tag.
### Korean reading form token filter
Replace the text of a token with the reading of the text as registered in the morphological dictionary.
### Korean stop tags token filter
Remove tokens with the specified part-of-speech tag.
### Length token filter
Keep only tokens with the specified number of characters of text.
### Lowercase token filter
Normalizes token text to lowercase.
### Mapping token filter
Replace characters with the specified character mappings.
### Stop words token filter
Remove the tokens of the specified text.
### Uppercase token filter
Normalizes token text to uppercase.
## API reference
The API reference is available. Please see following URL:
- [lindera-filter](https://docs.rs/lindera-filter)