Crate gramatika

Crate gramatika 

Source
Expand description

§Gramatika

A minimal toolkit for writing parsers with Rust

§Getting Started

Add the dependency to your Cargo.toml:

[dependencies]
gramatika = "0.5"

Define an enum for your tokens and derive the Token trait:

#[macro_use]
extern crate gramatika;

use gramatika::{Span, Substr};

#[derive(Token)]
enum Token {
    #[pattern = "print"]
    Keyword(Substr, Span),

    #[pattern = r#"".*?""#]
    StringLiteral(Substr, Span),

    #[pattern = ";"]
    Punct(Substr, Span),

    #[pattern = r"\S+"]
    Unrecognized(Substr, Span),
}

Next, you’ll probably find it useful to declare a type alias for your TokenStream and ParseStream:

// ...
type Lexer = gramatika::TokenStream<Token>;
type ParseStream = gramatika::ParseStream<Token, Lexer>;

Then define your syntax tree structure:

// ...
struct Program {
    statements: Vec<Stmt>,
}

enum Stmt {
    Empty(Token),
    Print(PrintStmt),
}

struct PrintStmt {
    pub keyword: Token,
    pub string: Token,
    pub terminator: Token,
}

Finally, implement the Parse trait for each node of your syntax tree:

use gramatika::{Parse, ParseStreamer, Span, SpannedError, Substr, Token as _};

#[derive(Debug, Token)]
enum Token {
    // ...
}

// ...

impl Parse for Program {
    type Stream = ParseStream;

    fn parse(input: &mut Self::Stream) -> gramatika::Result<Self> {
        let mut statements = vec![];
        while !input.is_empty() {
            // We can call `ParseStream::parse::<T>()` for any `T: Parse`
            statements.push(input.parse()?);
        }
        Ok(Self { statements })
    }
}

impl Parse for Stmt {
    type Stream = ParseStream;

    fn parse(input: &mut Self::Stream) -> gramatika::Result<Self> {
        // Check the next token without advancing the stream
        match input.peek() {
            // `as_matchable()` cheaply decomposes a token into parts that
            // can be be pattern-matched: `(TokenKind, &str, Span)`
            Some(token) => match token.as_matchable() {
                // `ParseStream` implements `Iterator<Item = Token>`, so we
                // can call `next()` to consume the token and advance the stream
                (TokenKind::Punct, ";", _) => Ok(Stmt::Empty(input.next().unwrap())),
                // Recursively calling `input.parse()` lets us manage the
                // complexity of parsing deep and complex syntax trees by
                // taking it one small step at a time
                (TokenKind::Keyword, "print", _) => Ok(Stmt::Print(input.parse()?)),
                // Not what we were expecting? `SpannedError` has a `Display`
                // implementation that's perfect for giving useful feedback
                // to users of our language, highlighting the exact place in
                // the source code where the error occurred.
                (_, _, span) => Err(SpannedError {
                    message: "Expected `;` or `print`".into(),
                    source: input.source(),
                    span: Some(span),
                }),
            }
            None => Err(SpannedError {
                message: "Unexpected end of input".into(),
                source: input.source(),
                span: None,
            }),
        }
    }
}

impl Parse for PrintStmt {
    type Stream = ParseStream;

    fn parse(input: &mut Self::Stream) -> gramatika::Result<Self> {
        // All of the `ParseStream` methods that return a `Result` will emit
        // a `SpannedError` much like the examples above, so we can use the
        // `?` operator to consume the tokens we're expecting without
        // needing to specify an error message for every single one.
        let keyword = input.consume(keyword![print])?;
        let string = input.consume_kind(TokenKind::StringLiteral)?;
        let terminator = input.consume(punct![;])?;

        Ok(PrintStmt {
            keyword,
            string,
            terminator,
        })
    }
}

And now we can give it a proper test run:

use gramatika::{
    Parse, ParseStreamer, Span, Spanned, SpannedError, Substr, Token as _,
};
// ...
let input = r#"
print "Hello, world!";
"#;

let mut parser = ParseStream::from(input);
let program = parser.parse::<Program>();
assert!(program.is_ok());

let program = program.unwrap();
let stmt = &program.statements[0];
assert!(matches!(stmt, Stmt::Print(_)));

let Stmt::Print(stmt) = stmt else {
    unreachable!();
};
assert_eq!(stmt.string.lexeme(), "\"Hello, world!\"");
assert_eq!(stmt.string.span(), span!(2:7..2:22));

But what does it look like when things don’t go so smoothly?

// ...

#[derive(Debug)]
struct Program {
    statements: Vec<Stmt>,
}

#[derive(Debug)]
enum Stmt {
    // ...
}

#[derive(Debug)]
struct PrintStmt {
    // ...
}

// ...

let input = r#"
pritn "Hello, world!";
"#;

let mut parser = ParseStream::from(input);
let program = parser.parse::<Program>();
assert!(program.is_err());

let error = program.unwrap_err();
assert_eq!(format!("{error}"), r#"
ERROR: Expected `;` or `print`
  |
2 | pritn "Hello, world!";
  | ^----
"#);

§Next Steps

This toy example only scratches the surface. To continue exploring, learn more about Token and Lexer generation in the lexer module, and check out the parse module to learn about all the tools available to you when implementing the Parse trait.

You can also explore two fully-working, non-trivial example projects at the GitHub repository:

  • examples/lox is a parser for the Lox programming language implemented with Gramatika’s derive macros.

  • examples/lox_manual_impl is a parser that manually implements Gramatika’s traits by hand-writing all of the code that’s normally generated by the derive macros.

    This is a great place to start if you’re curious about the implementation details, or if you need to manually implement any of Gramatika’s traits to cover a special use case.

Re-exports§

pub use arcstr;
pub use debug::*;
pub use error::*;
pub use lexer::*;
pub use parse::*;

Modules§

debug
error
lexer
This module defines the Lexer, Token, and TokenStream types that lay the groundwork for parsing with Gramatika.
parse

Macros§

span

Structs§

ArcStr
A better atomically-reference counted string type.
Position
Represents a cursor position within a string. Line and character offsets are zero-indexed in the internal representation, but lines will be printed as 1-indexed by SpannedError for consistency with IDEs. Character offsets are relative to the current line.
Span
A simple representation of the location of some substring within a larger string. Primarily used by SpannedError to provide user-friendly error formatting.
Substr
A low-cost string type representing a view into an ArcStr.

Traits§

Spanned

Type Aliases§

SourceStr
A type alias for arcstr::ArcStr (by default) or arcstr::Substr (if the substr-source feature is enabled).

Derive Macros§

DebugLisp
DebugLispToken
Lexer
Token