# Pattern Lexer
My personal implementation of a [lexer](https://en.wikipedia.org/wiki/Lexical_analysis).
## Principle
This lexer is making use of the `Pattern` trait to find tokens. \
The idea is to create `Tokens`, explain how to match them with a `Pattern` and build them from the matched `String` value.
### Pattern
A string `Pattern` trait.
The type implementing it can be used as a pattern for `&str`,
by default it is implemented for the following types.
| `char` | is contained in string |
| `&str` | is substring |
| `String` | is substring |
| `&[char]` | any `char` match |
| `&[&str]` | any `&str` match |
| `F: Fn(&str) -> bool` | `F` returns `true` for substring (slow) |
| `Regex` | `Regex` match substring |
### Usage
The `lexer!` macro match the following syntax.
```rust
lexer!(
// Ordered by priority
NAME(optional types, ...) {
impl Pattern => |value: String| -> Token,
...,
},
...,
);
```
It generates module `gen` which contains `Token`, `LexerError`, `LexerResult` and `Lexer`.
You can now call `Token::tokenize` to tokenize a `&str`,
it should return a `Lexer` instance that implements `Iterator`. \
Each iteration, the `Lexer` tries to match one of the given `Pattern` and returns a `LexerResult<Token>` built from the best match.
### Example
Here is an example for a simple math lexer.
```rust
lexer!(
// Different operators
OPERATOR(char) {
'+' => |_| Token::OPERATOR('+'),
'-' => |_| Token::OPERATOR('-'),
'*' => |_| Token::OPERATOR('*'),
'/' => |_| Token::OPERATOR('/'),
'=' => |_| Token::OPERATOR('='),
},
// Integer numbers
NUMBER(usize) {
|s: &str| s.chars().all(|c| c.is_digit(10))
=> |v: String| Token::NUMBER(v.parse().unwrap()),
},
// Variable names
IDENTIFIER(String) {
regex!(r"[a-zA-Z_$][a-zA-Z_$0-9]*")
=> |v: String| Token::IDENTIFIER(v),
},
WHITESPACE {
[' ', '\n'] => |_| Token::WHITESPACE,
},
);
```
That will expand to these enum and structs.
```rust
mod gen {
pub enum Token {
OPERATOR(char),
NUMBER(usize),
IDENTIFIER(String),
WHITESPACE,
}
pub struct Lexer {...}
pub struct LexerError {...}
pub type LexerResult<T> = Result<T, LexerError>;
}
```
And you can use them afterwards.
```rust
use gen::*;
let mut lex = Token::tokenize("x_4 = 1 + 3 = 2 * 2");
assert_eq!(lex.nth(2), Some(Ok(Token::OPERATOR('='))));
assert_eq!(lex.nth(5), Some(Ok(Token::NUMBER(3))));
// Our lexer doesn't handle parenthesis...
let mut err = Token::tokenize("x_4 = (1 + 3)");