tktax-io 0.2.2

A library providing text preprocessing, tokenization, and formatted header printing utilities for the TKTAX project.
Documentation
  • Coverage
  • 0%
    0 out of 7 items documented0 out of 0 items with examples
  • Size
  • Source code size: 77.78 kB This is the summed size of all the files inside the crates.io package for this release.
  • Documentation size: 1.27 MB This is the summed size of all files generated by rustdoc for all configured targets
  • Ø build duration
  • this release: 1m 17s Average build duration of successful builds.
  • all releases: 1m 17s Average build duration of successful builds in releases after 2024-10-23.
  • Links
  • Repository
  • crates.io
  • Dependencies
  • Versions
  • Owners
  • klebs6

tktax-io

tktax-io is a Rust library that supplies text preprocessing utilities, tokenization and stemming routines, as well as configurable formatted header-printing functions. It is designed for integration within the TKTAX project but can also be adopted for general lexical cleansing or linguistic normalization workflows.

Features

  • Punctuation Filtering: Uses regex to remove extraneous punctuation and special symbols.
  • Case Normalization: Converts strings to lowercase for uniform comparisons.
  • Tokenization & Stemming: Splits text using Unicode word boundaries and applies Snowball-based stemming to reduce words to canonical roots.
  • Formatted Header Printing: Generates structured output lines with user-configurable width and character styles.

Example Usage

Below is a minimal example showing how to use the main functions in this crate:

use tktax_io::{preprocess, tokenize_and_stem, print_header, print_thick_header};

fn main() {
    // Input text to preprocess
    let transaction_description = "7-ELEVEN!!!";

    // Remove punctuation and transform to lowercase
    let clean_text = preprocess(transaction_description);
    println!("Preprocessed: {}", clean_text);

    // Tokenize and stem the cleaned text
    let tokens = tokenize_and_stem(&clean_text);
    println!("Tokens: {:?}", tokens);

    // Print a couple of headers
    print_header("Light Header");
    print_thick_header("Heavy Header");
}

Run the tests with:

cargo test

Contributing

  1. Fork the repository and create a feature branch.
  2. Make changes, then open a pull request to the main repository.
  3. Provide a clear and detailed description of all modifications.

License

This project is licensed under either of:

  • Apache License, Version 2.0
  • MIT License

at your option.