tokenizer_py-0.1.0 has been yanked.
Python-like Tokenizer in Rust
This project implements a Python-like tokenizer in Rust. It can tokenize a string into a sequence of tokens, which are
represented by the Token enum. The supported tokens are:
Name: a name token, such as a function or variable nameNumber: a number token, such as a literal integer or floating-point numberString: a string token, such as a single or double-quoted stringOP: an operator token, such as an arithmetic or comparison operatorIndent: an indent token, indicating that a block of code is being indentedDedent: a dedent token, indicating that a block of code is being dedentedComment: a comment token, such as a single-line or multi-line commentNewLine: a newline token, indicating a new line in the source codeNL: a token indicating a new line, for compatibility with the original tokenizerEndMarker: an end-of-file marker
The tokenizer uses a simple state machine to tokenize the input text. It recognizes the following tokens:
Whitespace: spaces, tabs, and newlinesNumbers: integers and floating-point numbersNames: identifiers and keywordsStrings: single- and double-quoted stringsOperators: arithmetic, comparison, and other operatorsComments: single- and multi-line comments
The tokenizer also provides a tokenize method that takes a string as input and returns a Result containing a vector
of tokens.
Here is an example of how to use the tokenizer:
use ;
let tokenizer = new;
let tokens = tokenizer.tokenize.unwrap;
assert_eq!;
Error Handling
The tokenizer uses the Result type to indicate possible errors during tokenization. The possible errors are:
Operator: an invalid operator was encounteredNumber: an invalid number was encounteredIndent: an invalid indent was encounteredString: an invalid string was encountered
Here is an example of how to handle these errors:
match tokenizer.tokenize