Crate logos

source ·
Expand description
Logos logo

Logos

Create ridiculously fast Lexers.

Logos has two goals:

  • To make it easy to create a Lexer, so you can focus on more complex problems.
  • To make the generated Lexer faster than anything you’d write by hand.

To achieve those, Logos:

Example

use logos::Logos;

#[derive(Logos, Debug, PartialEq)]
#[logos(skip r"[ \t\n\f]+")] // Ignore this regex pattern between tokens
enum Token {
    // Tokens can be literal strings, of any length.
    #[token("fast")]
    Fast,

    #[token(".")]
    Period,

    // Or regular expressions.
    #[regex("[a-zA-Z]+")]
    Text,
}

fn main() {
    let mut lex = Token::lexer("Create ridiculously fast Lexers.");

    assert_eq!(lex.next(), Some(Ok(Token::Text)));
    assert_eq!(lex.span(), 0..6);
    assert_eq!(lex.slice(), "Create");

    assert_eq!(lex.next(), Some(Ok(Token::Text)));
    assert_eq!(lex.span(), 7..19);
    assert_eq!(lex.slice(), "ridiculously");

    assert_eq!(lex.next(), Some(Ok(Token::Fast)));
    assert_eq!(lex.span(), 20..24);
    assert_eq!(lex.slice(), "fast");

    assert_eq!(lex.next(), Some(Ok(Token::Text)));
    assert_eq!(lex.slice(), "Lexers");
    assert_eq!(lex.span(), 25..31);

    assert_eq!(lex.next(), Some(Ok(Token::Period)));
    assert_eq!(lex.span(), 31..32);
    assert_eq!(lex.slice(), ".");

    assert_eq!(lex.next(), None);
}

Callbacks

Logos can also call arbitrary functions whenever a pattern is matched, which can be used to put data into a variant:

use logos::{Logos, Lexer};

// Note: callbacks can return `Option` or `Result`
fn kilo(lex: &mut Lexer<Token>) -> Option<u64> {
    let slice = lex.slice();
    let n: u64 = slice[..slice.len() - 1].parse().ok()?; // skip 'k'
    Some(n * 1_000)
}

fn mega(lex: &mut Lexer<Token>) -> Option<u64> {
    let slice = lex.slice();
    let n: u64 = slice[..slice.len() - 1].parse().ok()?; // skip 'm'
    Some(n * 1_000_000)
}

#[derive(Logos, Debug, PartialEq)]
#[logos(skip r"[ \t\n\f]+")]
enum Token {
    // Callbacks can use closure syntax, or refer
    // to a function defined elsewhere.
    //
    // Each pattern can have it's own callback.
    #[regex("[0-9]+", |lex| lex.slice().parse().ok())]
    #[regex("[0-9]+k", kilo)]
    #[regex("[0-9]+m", mega)]
    Number(u64),
}

fn main() {
    let mut lex = Token::lexer("5 42k 75m");

    assert_eq!(lex.next(), Some(Ok(Token::Number(5))));
    assert_eq!(lex.slice(), "5");

    assert_eq!(lex.next(), Some(Ok(Token::Number(42_000))));
    assert_eq!(lex.slice(), "42k");

    assert_eq!(lex.next(), Some(Ok(Token::Number(75_000_000))));
    assert_eq!(lex.slice(), "75m");

    assert_eq!(lex.next(), None);
}

Logos can handle callbacks with following return types:

Return typeProduces
()Ok(Token::Unit)
boolOk(Token::Unit) or Err(<Token as Logos>::Error::default())
Result<(), E>Ok(Token::Unit) or Err(<Token as Logos>::Error::from(err))
TOk(Token::Value(T))
Option<T>Ok(Token::Value(T)) or Err(<Token as Logos>::Error::default())
Result<T, E>Ok(Token::Value(T)) or Err(<Token as Logos>::Error::from(err))
Skipskips matched input
Filter<T>Ok(Token::Value(T)) or skips matched input
FilterResult<T, E>Ok(Token::Value(T)) or Err(<Token as Logos>::Error::from(err)) or skips matched input

Callbacks can be also used to do perform more specialized lexing in place where regular expressions are too limiting. For specifics look at Lexer::remainder and Lexer::bump.

Errors

By default, Logos uses () as the error type, which means that it doesn’t store any information about the error. This can be changed by using #[logos(error = T)] attribute on the enum. The type T can be any type that implements Clone, PartialEq, Default and From<E> for each callback’s error type.

Token disambiguation

Rule of thumb is:

  • Longer beats shorter.
  • Specific beats generic.

If any two definitions could match the same input, like fast and [a-zA-Z]+ in the example above, it’s the longer and more specific definition of Token::Fast that will be the result.

This is done by comparing numeric priority attached to each definition. Every consecutive, non-repeating single byte adds 2 to the priority, while every range or regex class adds 1. Loops or optional blocks are ignored, while alternations count the shortest alternative:

  • [a-zA-Z]+ has a priority of 1 (lowest possible), because at minimum it can match a single byte to a class.
  • foobar has a priority of 12.
  • (foo|hello)(bar)? has a priority of 6, foo being it’s shortest possible match.

Re-exports

  • pub use crate::source::Source;

Modules

  • This module contains a bunch of traits necessary for processing byte strings.

Structs

  • Lexer is the main struct of the crate that allows you to read through a Source and produce tokens for enums implementing the Logos trait.
  • Type that can be returned from a callback, informing the Lexer, to skip current token match. See also logos::skip.
  • Iterator that pairs tokens with their position in the source.

Enums

  • Type that can be returned from a callback, either producing a field for a token, or skipping it.
  • Type that can be returned from a callback, either producing a field for a token, skipping it, or emitting an error.

Traits

  • Trait implemented for an enum representing all tokens. You should never have to implement it manually, use the #[derive(Logos)] attribute on your enum.

Functions

  • Predefined callback that will inform the Lexer to skip a definition.

Type Definitions

  • Byte range in the source.

Derive Macros