tokenizer_py 0.1.3

crate with a tokenizer that works like a Python tokenizer
Documentation
tokenizer_py-0.1.3 has been yanked.

Python-like Tokenizer in Rust

Static Badge Crates.io Version Crates.io MSRV (version) docs.rs (with version) GitHub Actions Workflow Status Crates.io License

This project implements a Python-like tokenizer in Rust. It can tokenize a string into a sequence of tokens, which are represented by the Token enum. The supported tokens are:

  • Token::Name: a name token, such as a function or variable name
  • Token::Number: a number token, such as a literal integer or floating-point number
  • Token::String: a string token, such as a single or double-quoted string
  • Token::OP: an operator token, such as an arithmetic or comparison operator
  • Token::Indent: an indent token, indicating that a block of code is being indented
  • Token::Dedent: a dedent token, indicating that a block of code is being dedented
  • Token::Comment: a comment token, such as a single-line or multi-line comment
  • Token::NewLine: a newline token, indicating a new line in the source code
  • Token::NL: a token indicating a new line, for compatibility with the original tokenizer
  • Token::EndMarker: an end-of-file marker

The tokenizer recognizes the following tokens:

  • Whitespace: spaces, tabs, and newlines
  • Numbers: integers and floating-point numbers
    • float: floats numbers
    • int: integer numbers
  • Names: identifiers and keywords
  • Strings: single- and double-quoted strings
    • basic-String: single- and double-quoted strings
    • format-String: format string from python
    • byte-String: byte string from python
    • raw-String: raw string
    • multy-line-String: single- and double-quoted multy-line-string
  • Operators: arithmetic, comparison, and other operators
  • Comments: single-line comments

The tokenizer also provides a tokenize method that takes a string as input and returns a Result containing a vector of tokens.

Here is an example of how to use the tokenizer:

use tokenizer_py::{Tokenizer, Token};

let tokenizer = Tokenizer::new("hello world");
let tokens = tokenizer.tokenize().unwrap();
assert_eq!(tokens, vec![
    Token::Name("hello".to_string()),
    Token::Name("world".to_string()),
    Token::NewLine,
    Token::EndMarker,
]);

Usages

Add this to your Cargo.toml:

[dependencies]

tokenizer_py = "0.1.3"