Expand description
§tinytoken
This library provides a tokenizer for parsing and categorizing different types of tokens, such as words, numbers, strings, characters, symbols, and operators. It includes configurable options to handle various tokenization rules and formats, enabling fine-grained control over how text input is parsed.
§Example
use tinytoken::{Tokenizer, TokenizerBuilder, Choice};
fn main() {
let tokenizer = TokenizerBuilder::new()
.parse_char_as_string(true)
.allow_digit_separator(Choice::Yes('_'))
.add_symbol('$')
.add_operators(&['+', '-'])
.build("let x = 123_456 + 0xFF");
match tokenizer.tokenize() {
Ok(tokens) => {
for token in tokens {
println!("{:?}", token);
}
}
Err(err) => {
eprintln!("Tokenization error: {err}");
}
}
}
§Contributions
Feel free to send a PR to improve and/or extend the tool capabilities
Modules§
- Contains error definitions specific to tokenization
Structs§
- Represents the location of a token in the input text, with line and column values
- Represents an individual token with type, value, and location
- Primary struct for tokenizing an input string, with methods for parsing and generating tokens
- A builder struct for creating a TokenizerConfig instance with customized options
- Configuration struct for the tokenizer, allowing customization of tokenization behavior
Enums§
- Configurable option for specific settings in TokenizerConfig
- Represents the types of numeric tokens recognized by the tokenizer
- Represents all possible token types that can be parsed by the tokenizer