tktax-io
tktax-io is a Rust library that supplies text preprocessing utilities, tokenization and stemming routines, as well as configurable formatted header-printing functions. It is designed for integration within the TKTAX project but can also be adopted for general lexical cleansing or linguistic normalization workflows.
Features
- Punctuation Filtering: Uses
regexto remove extraneous punctuation and special symbols. - Case Normalization: Converts strings to lowercase for uniform comparisons.
- Tokenization & Stemming: Splits text using Unicode word boundaries and applies Snowball-based stemming to reduce words to canonical roots.
- Formatted Header Printing: Generates structured output lines with user-configurable width and character styles.
Example Usage
Below is a minimal example showing how to use the main functions in this crate:
use ;
Run the tests with:
Contributing
- Fork the repository and create a feature branch.
- Make changes, then open a pull request to the main repository.
- Provide a clear and detailed description of all modifications.
License
This project is licensed under either of:
- Apache License, Version 2.0
- MIT License
at your option.