parsit 0.1.1

very simple lib, the parsing combinators, recursive descendent that uses logos as lexer
Documentation

ParseIt

Description

This library provides a very simple and lightweight parser (recursive descendant ll(1)) to combine and express a given grammar.

The library uses Logos as a lexical analyzer and tokenizer.

The premise

This library major incentives were:

  • lightweight : very small and does not require a deep dive
  • transparency : literally 3 structs with a handful of methods
  • speed : not bad speed (with a gratitude to Logos)

The steps to implement

Create a set of tokens using Logos

use logos::Logos;

#[derive(Logos, Debug, PartialEq)]
enum Token {
    // Tokens can be literal strings, of any length.
    #[token("fast")]
    Fast,

    #[token(".")]
    Period,

    // Or regular expressions.
    #[regex("[a-zA-Z]+")]
    Text,

    // Logos requires one token variant to handle errors,
    // it can be named anything you wish.
    #[error]
    // We can also use this variant to define whitespace,
    // or any other matches we wish to skip.
    #[regex(r"[ \t\n\f]+", logos::skip)]
    Error,
}

Create a parser that will be able to parse the given set of tokens

The library provides ParseIt<'a,T> instance that encompasses a set of tokens and auxiliary methods


  struct Parser<'a> {
        inner: ParseIt<'a, Token<'a>>,
    }

Implement a parsing functions using ParseIt instance and auxiliary methods from the Step

The helpers:

  • the macros token! that alleviates comparing and matching single tokens
  • methods then, then_zip and others from Step
  • methods one_or_more, zero_or_more from ParseIt

Transform the result into Result<Structure, ParserError<'a>>

      fn text(&self, pos: usize) -> Result<Vec<Sentence<'a>>, ParseError<'a>> {
            self.inner.zero_or_more(pos, |p| self.sentence(p)).into()
        }

Complete example

  use crate::parser::ParseIt;
    use crate::token;
    use crate::step::Step;
    use crate::parser::EmptyToken;
    use crate::error::ParseError;
    use logos::Logos;


    #[derive(Logos, Debug, Copy, Clone, PartialEq)]
    pub enum Token<'a> {
        #[regex(r"[a-zA-Z-]+")]
        Word(&'a str),

        #[token(",")]
        Comma,
        #[token(".")]
        Dot,

        #[token("!")]
        Bang,
        #[token("?")]
        Question,

        #[regex(r"[ \t\r\n\u000C\f]+", logos::skip)]
        Whitespace,
        #[error]
        Error,
    }

    #[derive(Debug, Copy, Clone, PartialEq)]
    enum Item<'a> {
        Word(&'a str),
        Comma,
    }

    #[derive(Debug, Clone, PartialEq)]
    enum Sentence<'a> {
        Sentence(Vec<Item<'a>>),
        Question(Vec<Item<'a>>),
        Exclamation(Vec<Item<'a>>),
    }

    struct Parser<'a> {
        inner: ParseIt<'a, Token<'a>>,
    }

    impl<'a> Parser<'a> {
        fn new(text: &'a str) -> Parser<'a> {
            let delegate: ParseIt<Token> = ParseIt::new(text).unwrap();
            Parser { inner: delegate }
        }

        fn sentence(&self, pos: usize) -> Step<'a, Sentence<'a>> {
            let items = |p| self.inner.one_or_more(p, |p| self.word(p));

            let sentence = |p| items(p)

                .then_zip(|p| token!(self.inner.token(p) => Token::Dot))
                .take_left()
                .map(Sentence::Sentence);

            let exclamation = |p| items(p)

                .then_zip(|p| token!(self.inner.token(p) => Token::Bang))
                .take_left()
                .map(Sentence::Exclamation);
            let question = |p| items(p)

                .then_zip(|p| token!(self.inner.token(p) => Token::Question))
                .take_left()
                .map(Sentence::Question);

            sentence(pos)
                .or_from(pos)
                .or(exclamation)
                .or(question).into()
        }
        fn word(&self, pos: usize) -> Step<'a, Item<'a>> {
            token!(self.inner.token(pos) =>
                     Token::Word(v) => Item::Word(v),
                     Token::Comma => Item::Comma
            )
        }
        fn text(&self, pos: usize) -> Result<Vec<Sentence<'a>>, ParseError<'a>> {
            self.inner.zero_or_more(pos, |p| self.sentence(p)).into()
        }
    }


    #[test]
    fn test() {
        let parser = Parser::new(r#"
            I have a strange addiction,
            It often sets off sparks!
            I really cannot seem to stop,
            Using exclamation marks!
            Anyone heard of the interrobang?
            The poem is for kids.
        "#);

        let result = parser.text(0).unwrap();
        println!("{:?}",result);
    }

The base auxiliary methods

On parser

  • token - gives a possibility to pull out a curren token
  • one_or_more - gives a one or more semantic
  • zero_or_more - gives a zero or more semantic
  • validate_eof - ensure the parser reaches end of the input

On step

To alternate
  • or - gives an alternative in a horizon of one token
  • or_from - gives a backtracking option
To combine
  • then - gives a basic combination with the next rule ommiting the current one
  • then_zip - combines a current result and a next one into a pair
  • then_or_none -combines a next one in an option with the current one or return a none otherwise
To collect
  • take_left - drops a right value from the pair
  • take_right - drops a left value from the pair
  • merge - merge a value into a list
  • to_map - transforms a list of pairs into a map
To transform
  • or_val - replaces a value with a default value if it is not presented
  • or_none - replaces a value with a none if it is not presented
To work with value
  • ok - transforms a value into option
  • error - transforms an error into option
  • map - transforms a value
  • combine - combines a value with another value from a given step
  • validate - validates a given value and transforms into error if a validation is failed
To print
  • print - print a step
  • print_with - print a step with a given prefix
  • print_as - print a step with a transformation of value
  • print_with_as - print a step with a transformation of value with a given prefix