Logos
Create ridiculously fast Lexers.
Logos has two goals:
- To make it easy to create a Lexer, so you can focus on more complex problems.
- To make the generated Lexer faster than anything you'd write by hand.
To achieve those, Logos:
- Combines all token definitions into a single deterministic state machine.
- Optimizes branches into lookup tables or jump tables.
- Prevents backtracking inside token definitions.
- Unwinds loops, and batches reads to minimize bounds checking.
- Does all of that heavy lifting at compile time.
Example
use Logos;
Callbacks
Logos can also call arbitrary functions whenever a pattern is matched, which can be used to put data into a variant:
use ;
// Note: callbacks can return `Option` or `Result`
Logos can handle callbacks with following return types:
Return type | Produces |
---|---|
() |
Token::Unit |
bool |
Token::Unit or <Token as Logos>::ERROR |
Result<(), _> |
Token::Unit or <Token as Logos>::ERROR |
T |
Token::Value(T) |
Option<T> |
Token::Value(T) or <Token as Logos>::ERROR |
Result<T, _> |
Token::Value(T) or <Token as Logos>::ERROR |
Skip |
skips matched input |
Filter<T> |
Token::Value(T) or skips matched input |
Callbacks can be also used to do perform more specialized lexing in place
where regular expressions are too limiting. For specifics look at
Lexer::remainder
and
Lexer::bump
.
Token disambiguation
Rule of thumb is:
- Longer beats shorter.
- Specific beats generic.
If any two definitions could match the same input, like fast
and [a-zA-Z]+
in the example above, it's the longer and more specific definition of Token::Fast
that will be the result.
This is done by comparing numeric priority attached to each definition. Every consecutive, non-repeating single byte adds 2 to the priority, while every range or regex class adds 1. Loops or optional blocks are ignored, while alternations count the shortest alternative:
[a-zA-Z]+
has a priority of 1 (lowest possible), because at minimum it can match a single byte to a class.foobar
has a priority of 12.(foo|hello)(bar)?
has a priority of 6,foo
being it's shortest possible match.