Crate combine [−] [src]
This crate contains parser combinators, roughly based on the Haskell library parsec.
A parser in this library can be described as a function which takes some input and if it
is succesful, returns a value together with the remaining input.
A parser combinator is a function which takes one or more parsers and returns a new parser.
For instance the many
parser can be used to convert a parser for single digits into one that
parses multiple digits. By modeling parsers in this way it becomes simple to compose complex
parsers in an almost declarative way.
Overview
combine
limits itself to creating LL(1) parsers
(it is possible to opt-in to LL(k) parsing using the try
combinator) which makes the
parsers easy to reason about in both function and performance while sacrificing
some generality. In addition to you being able to reason better about the parsers you
construct combine
the library also takes the knowledge of being an LL parser and uses it to
automatically construct good error messages.
extern crate combine; use combine::{digit, letter, Parser, ParserExt}; fn main() { if let Err(err) = digit().or(letter()).parse("|") { println!("{}", err); // The println! call above prints // // Parse error at line: 1, column: 1 // Unexpected '|' // Expected 'digit' or 'letter' } }
This library currently contains three modules.
combinator
contains the before mentioned parser combinators and thus contains the main building blocks for creating any sort of complex parsers. It consists of free functionssuch
asmany
andsatisfy
as well as a theParserExt
trait which provides a few functions such asor
which are more natural to use method calls.char
provides parsers specifically working with streams of characters. As a few examples it has parsers for accepting digits, letters or whitespace.primitives
contains theParser
andStream
traits which are the core abstractions in combine as well as various structs dealing with input streams and errors. You usually only need to use this module if you want more control over parsing and input streams.
Examples
extern crate combine; use combine::{spaces, many1, sep_by, digit, char, Parser, ParserExt, ParseError}; fn main() { //Parse spaces first and use the with method to only keep the result of the next parser let integer = spaces() //parse a string of digits into an i32 .with(many1(digit()).map(|string: String| string.parse::<i32>().unwrap())); //Parse integers separated by commas, skipping whitespace let mut integer_list = sep_by(integer, spaces().skip(char(','))); //Call parse with the input to execute the parser let input = "1234, 45,78"; let result: Result<(Vec<i32>, &str), ParseError<&str>> = integer_list.parse(input); match result { Ok((value, _remaining_input)) => println!("{:?}", value), Err(err) => println!("{}", err) } }
If we need a parser that is mutually recursive we can define a free function which internally
can in turn be used as a parser by using the parser
function which turns a function with the
correct signature into a parser. In this case we define expr
to work on any type of Stream
which is combine's way of abstracting over different data sources such as array slices, string
slices, iterators etc. If instead you would only need to parse string already in memory you
could define expr
as fn expr(input: State<&str>) -> ParseResult<Expr, &str>
extern crate combine; use combine::{between, char, letter, spaces, many1, parser, sep_by, Parser, ParserExt}; use combine::primitives::{State, Stream, ParseResult}; #[derive(Debug, PartialEq)] enum Expr { Id(String), Array(Vec<Expr>), Pair(Box<Expr>, Box<Expr>) } fn expr<I>(input: State<I>) -> ParseResult<Expr, I> where I: Stream<Item=char> { let word = many1(letter()); //Creates a parser which parses a char and skips any trailing whitespace let lex_char = |c| char(c).skip(spaces()); let comma_list = sep_by(parser(expr::<I>), lex_char(',')); let array = between(lex_char('['), lex_char(']'), comma_list); //We can use tuples to run several parsers in sequence //The resulting type is a tuple containing each parsers output let pair = (lex_char('('), parser(expr::<I>), lex_char(','), parser(expr::<I>), lex_char(')')) .map(|t| Expr::Pair(Box::new(t.1), Box::new(t.3))); word.map(Expr::Id) .or(array.map(Expr::Array)) .or(pair) .skip(spaces()) .parse_state(input) } fn main() { let result = parser(expr) .parse("[[], (hello, world), [rust]]"); let expr = Expr::Array(vec![ Expr::Array(Vec::new()) , Expr::Pair(Box::new(Expr::Id("hello".to_string())), Box::new(Expr::Id("world".to_string()))) , Expr::Array(vec![Expr::Id("rust".to_string())]) ]); assert_eq!(result, Ok((expr, ""))); }
Modules
char |
Module containg parsers specialized on character streams |
combinator |
Module containing all specific parsers |
primitives |
Module containing the primitive types which is used to create and compose more advanced parsers |
Structs
ParseError |
Struct which hold information about an error that occured at a specific position.
Can hold multiple instances of |
State |
The |
Traits
Parser |
By implementing the |
ParserExt |
Extension trait which provides functions that are more conveniently used through method calls |
Functions
alpha_num |
Parses either an alphabet letter or digit |
any |
Parses any token |
between |
Parses |
chainl1 |
Parses |
chainr1 |
Parses |
char |
Parses a character and succeeds if the character is equal to |
choice |
Takes an array of parsers and tries to apply them each in order. Fails if all parsers fails or if an applied parser consumes input before failing. |
crlf |
Parses carriage return and newline, returning the newline character. |
digit |
Parses a digit from a stream containing characters |
env_parser |
Constructs a parser out of an environment and a function which needs the given environment to do the parsing. This is commonly useful to allow multiple parsers to share some environment while still allowing the parsers to be written in separate functions. |
from_iter |
Converts an |
hex_digit |
Parses a hexdecimal digit with uppercase and lowercase |
letter |
Parses an alphabet letter |
look_ahead |
look_ahead acts as p but doesn't consume input on success. |
lower |
Parses an lowercase letter |
many |
Parses |
many1 |
Parses |
newline |
Parses a newline character |
not_followed_by |
Succeeds only if |
oct_digit |
Parses an octal digit |
optional |
Returns |
parser |
Wraps a function, turning it into a parser Mainly needed to turn closures into parsers as function types can be casted to function pointers to make them usable as a parser |
satisfy |
Parses a token and succeeds depending on the result of |
sep_by |
Parses |
sep_by1 |
Parses |
sep_end_by |
Parses |
sep_end_by1 |
Parses |
skip_many |
Parses |
skip_many1 |
Parses |
space |
Parses whitespace |
spaces |
Skips over zero or more spaces |
string |
Parses the string |
tab |
Parses a tab character |
token |
Parses a character and succeeds if the character is equal to |
try |
Try acts as |
unexpected |
Always fails with |
upper |
Parses an uppercase letter |
value |
Always returns the value |
Type Definitions
ParseResult |
A type alias over the specific |