Crate tinytoken

Expand description

§tinytoken

This library provides a tokenizer for parsing and categorizing different types of tokens, such as words, numbers, strings, characters, symbols, and operators. It includes configurable options to handle various tokenization rules and formats, enabling fine-grained control over how text input is parsed.

§Example

use tinytoken::{Tokenizer, TokenizerBuilder, Choice};

fn main() {
    let tokenizer = TokenizerBuilder::new()
        .parse_char_as_string(true)
        .allow_digit_separator(Choice::Yes('_'))
        .add_symbol('$')
        .add_operators(&['+', '-'])
        .build("let x = 123_456 + 0xFF");

    match tokenizer.tokenize() {
        Ok(tokens) => {
            for token in tokens {
                println!("{:?}", token);
            }
        }
        Err(err) => {
            eprintln!("Tokenization error: {err}");
        }
    }
}

§Contributions

Feel free to send a PR to improve and/or extend the tool capabilities

Modules§

error
Contains error definitions specific to tokenization

Structs§

Loc
Represents the location of a token in the input text, with line and column values
Token
Represents an individual token with type, value, and location
Tokenizer
Primary struct for tokenizing an input string, with methods for parsing and generating tokens
TokenizerBuilder
A builder struct for creating a TokenizerConfig instance with customized options
TokenizerConfig
Configuration struct for the tokenizer, allowing customization of tokenization behavior

Enums§

Choice
Configurable option for specific settings in TokenizerConfig
NumberType
Represents the types of numeric tokens recognized by the tokenizer
TokenType
Represents all possible token types that can be parsed by the tokenizer