Expand description
Pattern matching LEXER1 implementation.
§Principle
This lexer is making use of the Pattern trait to find tokens.
The idea is to create Tokens, explain how to match them with a Pattern and build them from the matched String value.
ⓘ
lexer!(
// Ordered by priority
NAME(optional types, ...) {
impl Pattern => |value: String| -> Token,
...,
},
...,
);The lexer! macro generates module lexer which contains Token, LexerError, LexerResult and Lexer.
You can now call Token::tokenize to tokenize a &str,
it should return a Lexer instance that implements Iterator.
Each iteration, the Lexer tries to match one of the given Pattern and returns a LexerResult<Token> built from the best match.
§Example
Here is an example for a simple math lexer.
lexer!(
// Different operators
OPERATOR(char) {
'+' => |_| Token::OPERATOR('+'),
'-' => |_| Token::OPERATOR('-'),
'*' => |_| Token::OPERATOR('*'),
'/' => |_| Token::OPERATOR('/'),
'=' => |_| Token::OPERATOR('='),
},
// Integer numbers
NUMBER(usize) {
|s: &str| s.chars().all(|c| c.is_digit(10))
=> |v: String| Token::NUMBER(v.parse().unwrap()),
},
// Variable names
IDENTIFIER(String) {
regex!(r"[a-zA-Z_$][a-zA-Z_$0-9]*")
=> |v: String| Token::IDENTIFIER(v),
},
WHITESPACE {
[' ', '\n'] => |_| Token::WHITESPACE,
},
);That will expand to these enum and structs.
ⓘ
mod lexer {
pub enum Token {
OPERATOR(char),
NUMBER(usize),
IDENTIFIER(String),
WHITESPACE,
}
pub struct Lexer {...}
pub struct LexerError {...}
pub type LexerResult<T> = Result<T, LexerError>;
}And you can use them afterwards.
use lexer::*;
let mut lex = Token::tokenize("x_4 = 1 + 3 = 2 * 2");
assert_eq!(lex.nth(2), Some(Ok(Token::OPERATOR('='))));
assert_eq!(lex.nth(5), Some(Ok(Token::NUMBER(3))));
// Our lexer doesn't handle parenthesis...
let mut err = Token::tokenize("x_4 = (1 + 3)");
assert!(err.nth(4).is_some_and(|res| res.is_err()));More details on Lexical analysis. ↩
Modules§
- pattern
- Module for Pattern matching. \