tinytoken 0.1.4

Library for tokenizing text into words, numbers, symbols, and more, with customizable parsing options.
Documentation
  • Coverage
  • 96%
    48 out of 50 items documented1 out of 22 items with examples
  • Size
  • Source code size: 27.73 kB This is the summed size of all the files inside the crates.io package for this release.
  • Documentation size: 2.3 MB This is the summed size of all files generated by rustdoc for all configured targets
  • Links
  • Homepage
  • luxluth/tinytoken
    0 0 0
  • crates.io
  • Dependencies
  • Versions
  • Owners
  • luxluth

tinytoken

This library provides a tokenizer for parsing and categorizing different types of tokens, such as words, numbers, strings, characters, symbols, and operators. It includes configurable options to handle various tokenization rules and formats, enabling fine-grained control over how text input is parsed.

Example

use tinytoken::{Tokenizer, TokenizerBuilder, Choice};

fn main() {
    let tokenizer = TokenizerBuilder::new()
        .parse_char_as_string(true)
        .allow_digit_separator(Choice::Yes('_'))
        .add_symbol('$')
        .add_operators(&['+', '-'])
        .build("let x = 123_456 + 0xFF");

    match tokenizer.tokenize() {
        Ok(tokens) => {
            for token in tokens {
                println!("{:?}", token);
            }
        }
        Err(err) => {
            eprintln!("Tokenization error: {err}");
        }
    }
}

Contributions

Feel free to send a PR to improve and/or extend the tool capabilities