Crate combine [] [src]

This crate contains parser combinators, roughly based on the Haskell library parsec.

A parser in this library can be described as a function which takes some input and if it is succesful, returns a value together with the remaining input. A parser combinator is a function which takes one or more parsers and returns a new parser. For instance the many parser can be used to convert a parser for single digits into one that parses multiple digits. By modeling parsers in this way it becomes simple to compose complex parsers in an almost declarative way.

Overview

combine limits itself to creating LL(1) parsers (it is possible to opt-in to LL(k) parsing using the try combinator) which makes the parsers easy to reason about in both function and performance while sacrificing some generality. In addition to you being able to reason better about the parsers you construct combine the library also takes the knowledge of being an LL parser and uses it to automatically construct good error messages.

extern crate combine;
use combine::{digit, letter, Parser, ParserExt};

fn main() {
    if let Err(err) = digit().or(letter()).parse("|") {
        println!("{}", err);
        // The println! call above prints
        //
        // Parse error at line: 1, column: 1
        // Unexpected '|'
        // Expected 'digit' or 'letter'
    }
}

This library currently contains three modules.

  • combinator contains the before mentioned parser combinators and thus contains the main building blocks for creating any sort of complex parsers. It consists of free functions such as many and satisfy as well as a the ParserExt trait which provides a few functions such as or which are more natural to use method calls.

  • char provides parsers specifically working with streams of characters. As a few examples it has parsers for accepting digits, letters or whitespace.

  • primitives contains the Parser and Stream traits which are the core abstractions in combine as well as various structs dealing with input streams and errors. You usually only need to use this module if you want more control over parsing and input streams.

Examples

extern crate combine;
use combine::{spaces, many1, sep_by, digit, char, Parser, ParserExt, ParseError};

fn main() {
    //Parse spaces first and use the with method to only keep the result of the next parser
    let integer = spaces()
        //parse a string of digits into an i32
        .with(many1(digit()).map(|string: String| string.parse::<i32>().unwrap()));

    //Parse integers separated by commas, skipping whitespace
    let mut integer_list = sep_by(integer, spaces().skip(char(',')));

    //Call parse with the input to execute the parser
    let input = "1234, 45,78";
    let result: Result<(Vec<i32>, &str), ParseError<&str>> = integer_list.parse(input);
    match result {
        Ok((value, _remaining_input)) => println!("{:?}", value),
        Err(err) => println!("{}", err)
    }
}

If we need a parser that is mutually recursive we can define a free function which internally can in turn be used as a parser by using the parser function which turns a function with the correct signature into a parser. In this case we define expr to work on any type of Stream which is combine's way of abstracting over different data sources such as array slices, string slices, iterators etc. If instead you would only need to parse string already in memory you could define expr as fn expr(input: &str) -> ParseResult<Expr, &str>

extern crate combine;
use combine::{between, char, letter, spaces, many1, parser, sep_by, Parser, ParserExt};
use combine::primitives::{State, Stream, ParseResult};

#[derive(Debug, PartialEq)]
enum Expr {
    Id(String),
    Array(Vec<Expr>),
    Pair(Box<Expr>, Box<Expr>)
}

fn expr<I>(input: I) -> ParseResult<Expr, I>
    where I: Stream<Item=char>
{
    let word = many1(letter());

    //Creates a parser which parses a char and skips any trailing whitespace
    let lex_char = |c| char(c).skip(spaces());

    let comma_list = sep_by(parser(expr::<I>), lex_char(','));
    let array = between(lex_char('['), lex_char(']'), comma_list);

    //We can use tuples to run several parsers in sequence
    //The resulting type is a tuple containing each parsers output
    let pair = (lex_char('('),
                parser(expr::<I>),
                lex_char(','),
                parser(expr::<I>),
                lex_char(')'))
                   .map(|t| Expr::Pair(Box::new(t.1), Box::new(t.3)));

    word.map(Expr::Id)
        .or(array.map(Expr::Array))
        .or(pair)
        .skip(spaces())
        .parse_state(input)
}

fn main() {
    let result = parser(expr)
        .parse("[[], (hello, world), [rust]]");
    let expr = Expr::Array(vec![
          Expr::Array(Vec::new())
        , Expr::Pair(Box::new(Expr::Id("hello".to_string())),
                     Box::new(Expr::Id("world".to_string())))
        , Expr::Array(vec![Expr::Id("rust".to_string())])
    ]);
    assert_eq!(result, Ok((expr, "")));
}

Modules

char

Module containg parsers specialized on character streams

combinator

Module containing all specific parsers

primitives

Module containing the primitive types which is used to create and compose more advanced parsers

Structs

ParseError

Struct which hold information about an error that occured at a specific position. Can hold multiple instances of Error if more that one error occured in the same position.

State

The State<I> struct keeps track of the current position in the stream I

Traits

Parser

By implementing the Parser trait a type says that it can be used to parse an input stream into the type Output.

ParserExt

Extension trait which provides functions that are more conveniently used through method calls

Stream

A stream of tokens which can be duplicated

StreamOnce

StreamOnce represents a sequence of items that can be extracted one by one.

Functions

alpha_num

Parses either an alphabet letter or digit

any

Parses any token

between

Parses open followed by parser followed by close Returns the value of parser

chainl1

Parses p 1 or more times separated by op The value returned is the one produced by the left associative application of op

chainr1

Parses p one or more times separated by op The value returned is the one produced by the right associative application of op

char

Parses a character and succeeds if the character is equal to c

choice

Takes an array of parsers and tries to apply them each in order. Fails if all parsers fails or if an applied parser consumes input before failing.

crlf

Parses carriage return and newline, returning the newline character.

digit

Parses a digit from a stream containing characters

env_parser

Constructs a parser out of an environment and a function which needs the given environment to do the parsing. This is commonly useful to allow multiple parsers to share some environment while still allowing the parsers to be written in separate functions.

eof

Succeeds only if the stream is at end of input, fails otherwise.

from_iter

Converts an Iterator into a stream.

hex_digit

Parses a hexdecimal digit with uppercase and lowercase

letter

Parses an alphabet letter

look_ahead

look_ahead acts as p but doesn't consume input on success.

lower

Parses an lowercase letter

many

Parses p zero or more times returning a collection with the values from p. If the returned collection cannot be inferred type annotations must be supplied, either by annotating the resulting type binding let collection: Vec<_> = ... or by specializing when calling many, many::<Vec<_>, _>(...)

many1

Parses p one or more times returning a collection with the values from p. If the returned collection cannot be inferred type annotations must be supplied, either by annotating the resulting type binding let collection: Vec<_> = ... or by specializing when calling many1 many1::<Vec<_>, _>(...)

newline

Parses a newline character

not_followed_by

Succeeds only if parser fails. Never consumes any input.

oct_digit

Parses an octal digit

optional

Returns Some(value) and None on parse failure (always succeeds)

parser

Wraps a function, turning it into a parser Mainly needed to turn closures into parsers as function types can be casted to function pointers to make them usable as a parser

satisfy

Parses a token and succeeds depending on the result of predicate

sep_by

Parses parser zero or more time separated by separator, returning a collection with the values from p. If the returned collection cannot be inferred type annotations must be supplied, either by annotating the resulting type binding let collection: Vec<_> = ... or by specializing when calling sep_by, sep_by::<Vec<_>, _, _>(...)

sep_by1

Parses parser one or more time separated by separator, returning a collection with the values from p. If the returned collection cannot be inferred type annotations must be supplied, either by annotating the resulting type binding let collection: Vec<_> = ... or by specializing when calling sep_by, sep_by1::<Vec<_>, _, _>(...)

sep_end_by

Parses parser zero or more time separated by separator, returning a collection with the values from p. If the returned collection cannot be inferred type annotations must be supplied, either by annotating the resulting type binding let collection: Vec<_> = ... or by specializing when calling sep_by, sep_by::<Vec<_>, _, _>(...)

sep_end_by1

Parses parser one or more time separated by separator, returning a collection with the values from p. If the returned collection cannot be inferred type annotations must be supplied, either by annotating the resulting type binding let collection: Vec<_> = ... or by specializing when calling sep_by, sep_by1::<Vec<_>, _, _>(...)

skip_many

Parses p zero or more times ignoring the result

skip_many1

Parses p one or more times ignoring the result

space

Parses whitespace

spaces

Skips over zero or more spaces

string

Parses the string s

tab

Parses a tab character

token

Parses a character and succeeds if the character is equal to c

try

Try acts as p except it acts as if the parser hadn't consumed any input if p returns an error after consuming input

unexpected

Always fails with message as an unexpected error. Never consumes any input.

upper

Parses an uppercase letter

value

Always returns the value v without consuming any input.

Type Definitions

ParseResult

A type alias over the specific Result type used by parsers to indicate wether they were successful or not. O is the type that is output on success I is the specific stream type used in the parser