Crate plexer

source ·
Expand description

Pattern matching LEXER1 implementation.

Principle

This lexer is making use of the Pattern trait to find tokens.
The idea is to create Tokens, explain how to match them with a Pattern and build them from the matched String value.

lexer!(
    // Ordered by priority
    NAME(optional types, ...) {
        impl Pattern => |value: String| -> Token,
        ...,
    },
    ...,
);

The lexer! macro generates module lexer which contains Token, LexerError, LexerResult and Lexer.

You can now call Token::tokenize to tokenize a &str, it should return a Lexer instance that implements Iterator.
Each iteration, the Lexer tries to match one of the given Pattern and returns a LexerResult<Token> built from the best match.

Example

Here is an example for a simple math lexer.

lexer!(
    // Different operators
    OPERATOR(char) {
        '+' => |_| Token::OPERATOR('+'),
        '-' => |_| Token::OPERATOR('-'),
        '*' => |_| Token::OPERATOR('*'),
        '/' => |_| Token::OPERATOR('/'),
        '=' => |_| Token::OPERATOR('='),
    },
    // Integer numbers
    NUMBER(usize) {
        |s: &str| s.chars().all(|c| c.is_digit(10))
            => |v: String| Token::NUMBER(v.parse().unwrap()),
    },
    // Variable names
    IDENTIFIER(String) {
        regex!(r"[a-zA-Z_$][a-zA-Z_$0-9]*")
            => |v: String| Token::IDENTIFIER(v),
    },
    WHITESPACE {
        [' ', '\n'] => |_| Token::WHITESPACE,
    },
);

That will expand to these enum and structs.

mod lexer {
    pub enum Token {
        OPERATOR(char),
        NUMBER(usize),
        IDENTIFIER(String),
        WHITESPACE,
    }

    pub struct Lexer {...}
    pub struct LexerError {...}
    pub type LexerResult<T> = Result<T, LexerError>;
}

And you can use them afterwards.

use lexer::*;

let mut lex = Token::tokenize("x_4 = 1 + 3 = 2 * 2");
assert_eq!(lex.nth(2), Some(Ok(Token::OPERATOR('='))));
assert_eq!(lex.nth(5), Some(Ok(Token::NUMBER(3))));

// Our lexer doesn't handle parenthesis...
let mut err = Token::tokenize("x_4 = (1 + 3)");
assert!(err.nth(4).is_some_and(|res| res.is_err()));

  1. More details on Lexical analysis

Modules

  • Module for Pattern matching.

Macros

  • Macro to build your own plugin-based lexer.
  • Macro to build a Regex.