Expand description
This crate contains parser combinators, roughly based on the Haskell library parsec.
A parser in this library can be described as a function which takes some input and if it
is succesful, returns a value together with the remaining input.
A parser combinator is a function which takes one or more parsers and returns a new parser.
For instance the many
parser can be used to convert a parser for single digits into one that
parses multiple digits.
§Overview
This library is currently split into three modules.
-
primitives
contains theParser
trait as well as various structs dealing with input streams and errors. -
combinator
contains the before mentioned parser combinators and thus contains the main building blocks for creating any sort of more complex parsers. It consists of free functions as well as a theParserExt
trait which provides a few functions which are more naturally used through method calls. -
char
is the last module. It provides parsers specifically working with streams of characters. As a few examples it has parsers for accepting digits, letters or whitespace.
§Examples
extern crate parser_combinators;
use parser_combinators::{spaces, many1, sep_by, digit, char, Parser, ParserExt, ParseError};
fn main() {
let input = "1234, 45,78";
let spaces = spaces();
let integer = spaces.clone()//Parse spaces first and use the with method to only keep the result of the next parser
.with(many1(digit()).map(|string: String| string.parse::<i32>().unwrap()));//parse a string of digits into an i32
//Parse integers separated by commas, skipping whitespace
let mut integer_list = sep_by(integer, spaces.skip(char(',')));
//Call parse with the input to execute the parser
let result: Result<(Vec<i32>, &str), ParseError<char>> = integer_list.parse(input);
match result {
Ok((value, _remaining_input)) => println!("{:?}", value),
Err(err) => println!("{}", err)
}
}
If we need a parser that is mutually recursive we can define a free function which internally can in turn be used as a parser (Note that we need to explicitly cast the function, this should not be necessary once changes in rustc to make orphan checking less restrictive gets implemented)
expr
is written fully general here which may not be necessary in a specific implementation
The Stream
trait is predefined to work with array slices, string slices and iterators
meaning that in this case it could be defined as
fn expr(input: State<&str>) -> ParseResult<Expr, &str>
extern crate parser_combinators;
use parser_combinators::{between, char, letter, spaces, many1, parser, sep_by, Parser, ParserExt,
ParseResult};
use parser_combinators::primitives::{State, Stream};
#[derive(Debug, PartialEq)]
enum Expr {
Id(String),
Array(Vec<Expr>),
Pair(Box<Expr>, Box<Expr>)
}
fn expr<I>(input: State<I>) -> ParseResult<Expr, I>
where I: Stream<Item=char> {
let word = many1(letter());
//Creates a parser which parses a char and skips any trailing whitespace
let lex_char = |c| char(c).skip(spaces());
let comma_list = sep_by(parser(expr::<I>), lex_char(','));
let array = between(lex_char('['), lex_char(']'), comma_list);
//We can use tuples to run several parsers in sequence
//The resulting type is a tuple containing each parsers output
let pair = (lex_char('('), parser(expr::<I>), lex_char(','), parser(expr::<I>), lex_char(')'))
.map(|t| Expr::Pair(Box::new(t.1), Box::new(t.3)));
word.map(Expr::Id)
.or(array.map(Expr::Array))
.or(pair)
.skip(spaces())
.parse_state(input)
}
fn main() {
let result = parser(expr)
.parse("[[], (hello, world), [rust]]");
let expr = Expr::Array(vec![
Expr::Array(Vec::new())
, Expr::Pair(Box::new(Expr::Id("hello".to_string())),
Box::new(Expr::Id("world".to_string())))
, Expr::Array(vec![Expr::Id("rust".to_string())])
]);
assert_eq!(result, Ok((expr, "")));
}
Modules§
- char
- Module containg parsers specialized on character streams
- combinator
- Module containing all specific parsers
- primitives
- Module containing the primitive types which is used to create and compose more advanced parsers
Structs§
- Parse
Error - Struct which hold information about an error that occured at a specific position.
Can hold multiple instances of
Error
if more that one error occured at the position. - State
- The
State<I>
struct keeps track of the current position in the streamI
Traits§
- Parser
- By implementing the
Parser
trait a type says that it can be used to parse an input stream into the typeOutput
. - Parser
Ext - Extension trait which provides functions that are more conveniently used through method calls
Functions§
- alpha_
num - Parses either an alphabet letter or digit
- any
- Parses any token
- between
- Parses
open
followed byparser
followed byclose
Returns the value ofparser
- chainl1
- Parses
p
1 or more times separated byop
The value returned is the one produced by the left associative application ofop
- char
- Parses a character and succeeds if the characther is equal to
c
- choice
- Takes an array of parsers and tries them each in turn. Fails if all parsers fails or when a parsers fails with a consumed state.
- crlf
- Parses carriage return and newline, returning the newline character.
- digit
- Parses a digit from a stream containing characters
- from_
iter - Converts an
Iterator
into a stream. - hex_
digit - Parses a hexdecimal digit with uppercase and lowercase
- letter
- Parses an alphabet letter
- lower
- Parses an lowercase letter
- many
- Parses
p
zero or more times returning a collection with the values fromp
. If the returned collection cannot be inferred type annotations must be supplied, either by annotating the resulting type bindinglet collection: Vec<_> = ...
or by specializing when calling many,many::<Vec<_>, _>(...)
- many1
- Parses
p
one or more times returning a collection with the values fromp
. If the returned collection cannot be inferred type annotations must be supplied, either by annotating the resulting type bindinglet collection: Vec<_> = ...
or by specializing when calling many1many1::<Vec<_>, _>(...)
- newline
- Parses a newline character
- not_
followed_ by - Succeeds only if
parser
fails. Never consumes any input. - oct_
digit - Parses an octal digit
- optional
- Returns
Some(value)
andNone
on parse failure (always succeeds) - parser
- Wraps a function, turning it into a parser Mainly needed to turn closures into parsers as function types can be casted to function pointers to make them usable as a parser
- satisfy
- Parses a token and succeeds depending on the result of
predicate
- sep_by
- Parses
parser
zero or more time separated byseparator
, returning a collection with the values fromp
. If the returned collection cannot be inferred type annotations must be supplied, either by annotating the resulting type bindinglet collection: Vec<_> = ...
or by specializing when calling sep_by,sep_by::<Vec<_>, _, _>(...)
- skip_
many - Parses
p
zero or more times ignoring the result - skip_
many1 - Parses
p
one or more times ignoring the result - space
- Parses whitespace
- spaces
- Skips over zero or more spaces
- string
- Parses the string
s
- tab
- Parses a tab character
- token
- Parses a character and succeeds if the characther is equal to
c
- try
- Try acts as
p
except it acts as if the parser hadn’t consumed any input ifp
returns an error after consuming input - unexpected
- Always fails with
message
as the error. Never consumes any input. - upper
- Parses an uppercase letter
- value
- Always returns the value
v
without consuming any input.