tokenizer_py-0.1.2 has been yanked.
Python-like Tokenizer in Rust
This project implements a Python-like tokenizer in Rust. It can tokenize a string into a sequence of tokens, which are
represented by the Token enum. The supported tokens are:
Token::Name: a name token, such as a function or variable nameToken::Number: a number token, such as a literal integer or floating-point numberToken::String: a string token, such as a single or double-quoted stringToken::OP: an operator token, such as an arithmetic or comparison operatorToken::Indent: an indent token, indicating that a block of code is being indentedToken::Dedent: a dedent token, indicating that a block of code is being dedentedToken::Comment: a comment token, such as a single-line or multi-line commentToken::NewLine: a newline token, indicating a new line in the source codeToken::NL: a token indicating a new line, for compatibility with the original tokenizerToken::EndMarker: an end-of-file marker
The tokenizer uses a simple state machine to tokenize the input text. It recognizes the following tokens:
-
Whitespace: spaces, tabs, and newlines -
Numbers: integers and floating-point numbers-
float: floats numbers -
int: integer numbers
-
-
Names: identifiers and keywords -
Strings: single- and double-quoted strings-
basic-String: single- and double-quoted strings -
format-String: format string from python -
byte-String: byte string from python -
raw-String: raw string -
multy-line-String: single- and double-quoted multy-line-string
-
-
Operators: arithmetic, comparison, and other operators -
Comments: single-line comments
The tokenizer also provides a tokenize method that takes a string as input and returns a Result containing a vector
of tokens.
Here is an example of how to use the tokenizer:
use ;
let tokenizer = new;
let tokens = tokenizer.tokenize.unwrap;
assert_eq!;
Usage
Add this to your Cargo.toml:
[]
= "0.1.1"
Error Handling
The tokenizer uses the Result type to indicate possible errors during tokenization. The possible errors are:
TokenizerError::Operator: an invalid operator was encounteredTokenizerError::Number: an invalid number was encounteredTokenizerError::Indent: an invalid indent was encounteredTokenizerError::String: an invalid string was encountered
Here is an example of how to handle these errors:
match tokenizer.tokenize