Crate tinytoken

Source
Expand description

§tinytoken

This library provides a tokenizer for parsing and categorizing different types of tokens, such as words, numbers, strings, characters, symbols, and operators. It includes configurable options to handle various tokenization rules and formats, enabling fine-grained control over how text input is parsed.

§Example

use tinytoken::{Tokenizer, TokenizerBuilder, Choice};

fn main() {
    let tokenizer = TokenizerBuilder::new()
        .parse_char_as_string(true)
        .allow_digit_separator(Choice::Yes('_'))
        .add_symbol('$')
        .add_operators(&['+', '-'])
        .build("let x = 123_456 + 0xFF");

    match tokenizer.tokenize() {
        Ok(tokens) => {
            for token in tokens {
                println!("{:?}", token);
            }
        }
        Err(err) => {
            eprintln!("Tokenization error: {err}");
        }
    }
}

§Contributions

Feel free to send a PR to improve and/or extend the tool capabilities

Modules§

  • Contains error definitions specific to tokenization

Structs§

  • Represents the location of a token in the input text, with line and column values
  • Represents an individual token with type, value, and location
  • Primary struct for tokenizing an input string, with methods for parsing and generating tokens
  • A builder struct for creating a TokenizerConfig instance with customized options
  • Configuration struct for the tokenizer, allowing customization of tokenization behavior

Enums§

  • Configurable option for specific settings in TokenizerConfig
  • Represents the types of numeric tokens recognized by the tokenizer
  • Represents all possible token types that can be parsed by the tokenizer