Expand description
This crate contains parser combinators, roughly based on the Haskell libraries parsec and attoparsec.
A parser in this library can be described as a function which takes some input and if it
is successful, returns a value together with the remaining input.
A parser combinator is a function which takes one or more parsers and returns a new parser.
For instance the many
parser can be used to convert a parser for single digits into one that
parses multiple digits. By modeling parsers in this way it becomes easy to compose complex
parsers in an almost declarative way.
Overview
combine
limits itself to creating LL(1) parsers
(it is possible to opt-in to LL(k) parsing using the attempt
combinator) which makes the
parsers easy to reason about in both function and performance while sacrificing
some generality. In addition to you being able to reason better about the parsers you
construct combine
the library also takes the knowledge of being an LL parser and uses it to
automatically construct good error messages.
extern crate combine;
use combine::Parser;
use combine::stream::state::State;
use combine::parser::char::{digit, letter};
const MSG: &'static str = r#"Parse error at line: 1, column: 1
Unexpected `|`
Expected `digit` or `letter`
"#;
fn main() {
// Wrapping a `&str` with `State` provides automatic line and column tracking. If `State`
// was not used the positions would instead only be pointers into the `&str`
if let Err(err) = digit().or(letter()).easy_parse(State::new("|")) {
assert_eq!(MSG, format!("{}", err));
}
}
This library is currently split into a few core modules:
-
parser
is where you will find all the parsers that combine provides. It contains the coreParser
trait as well as several submodules such assequence
orchoice
which each contain several parsers aimed at a specific niche. -
stream
contains the second most important trait next toParser
. Streams represent the data source which is being parsed such as&[u8]
,&str
or iterators. -
easy
contains combine’s default “easy” error and stream handling. If you use theeasy_parse
method to start your parsing these are the types that are used. -
error
contains the types and traits that make up combine’s error handling. Unless you need to customize the errors your parsers return you should not need to use this module much.
Examples
extern crate combine;
use combine::parser::char::{spaces, digit, char};
use combine::{many1, sep_by, Parser};
use combine::stream::easy;
fn main() {
//Parse spaces first and use the with method to only keep the result of the next parser
let integer = spaces()
//parse a string of digits into an i32
.with(many1(digit()).map(|string: String| string.parse::<i32>().unwrap()));
//Parse integers separated by commas, skipping whitespace
let mut integer_list = sep_by(integer, spaces().skip(char(',')));
//Call parse with the input to execute the parser
let input = "1234, 45,78";
let result: Result<(Vec<i32>, &str), easy::ParseError<&str>> =
integer_list.easy_parse(input);
match result {
Ok((value, _remaining_input)) => println!("{:?}", value),
Err(err) => println!("{}", err)
}
}
If we need a parser that is mutually recursive or if we want to export a reusable parser the
parser!
macro can be used. In effect it makes it possible to return a parser without naming
the type of the parser (which can be very large due to combine’s trait based approach). While
it is possible to do avoid naming the type without the macro those solutions require either allocation
(Box<Parser<Input = I, Output = O, PartialState = P>>
) or nightly rust via impl Trait
. The
macro thus threads the needle and makes it possible to have non-allocating, anonymous parsers
on stable rust.
#[macro_use]
extern crate combine;
use combine::parser::char::{char, letter, spaces};
use combine::{between, choice, many1, parser, sep_by, Parser};
use combine::error::{ParseError, ParseResult};
use combine::stream::{Stream, Positioned};
use combine::stream::state::State;
#[derive(Debug, PartialEq)]
pub enum Expr {
Id(String),
Array(Vec<Expr>),
Pair(Box<Expr>, Box<Expr>)
}
// `impl Parser` can be used to create reusable parsers with zero overhead
fn expr_<I>() -> impl Parser<Input = I, Output = Expr>
where I: Stream<Item = char>,
// Necessary due to rust-lang/rust#24159
I::Error: ParseError<I::Item, I::Range, I::Position>,
{
let word = many1(letter());
// A parser which skips past whitespace.
// Since we aren't interested in knowing that our expression parser
// could have accepted additional whitespace between the tokens we also silence the error.
let skip_spaces = || spaces().silent();
//Creates a parser which parses a char and skips any trailing whitespace
let lex_char = |c| char(c).skip(skip_spaces());
let comma_list = sep_by(expr(), lex_char(','));
let array = between(lex_char('['), lex_char(']'), comma_list);
//We can use tuples to run several parsers in sequence
//The resulting type is a tuple containing each parsers output
let pair = (lex_char('('),
expr(),
lex_char(','),
expr(),
lex_char(')'))
.map(|t| Expr::Pair(Box::new(t.1), Box::new(t.3)));
choice((
word.map(Expr::Id),
array.map(Expr::Array),
pair,
))
.skip(skip_spaces())
}
// As this expression parser needs to be able to call itself recursively `impl Parser` can't
// be used on its own as that would cause an infinitely large type. We can avoid this by using
// the `parser!` macro which erases the inner type and the size of that type entirely which
// lets it be used recursively.
//
// (This macro does not use `impl Trait` which means it can be used in rust < 1.26 as well to
// emulate `impl Parser`)
parser!{
fn expr[I]()(I) -> Expr
where [I: Stream<Item = char>]
{
expr_()
}
}
fn main() {
let result = expr()
.parse("[[], (hello, world), [rust]]");
let expr = Expr::Array(vec![
Expr::Array(Vec::new())
, Expr::Pair(Box::new(Expr::Id("hello".to_string())),
Box::new(Expr::Id("world".to_string())))
, Expr::Array(vec![Expr::Id("rust".to_string())])
]);
assert_eq!(result, Ok((expr, "")));
}
Re-exports
Modules
Macros
Traits
Parser
trait a type says that it can be used to parse an input stream
into the type Output
.RangeStream
is an extension of Stream
which allows for zero copy parsing.RangeStream
is an extension of StreamOnce
which allows for zero copy parsing.StreamOnce
represents a sequence of items that can be extracted one by one.Functions
attempt(p)
behaves as p
except it acts as if the parser hadn’t consumed any input if p
fails
after consuming input. (alias for try
)open
followed by parser
followed by close
.
Returns the value of parser
.p
1 or more times separated by op
. The value returned is the one produced by the
left associative application of the function returned by the parser op
.p
one or more times separated by op
. The value returned is the one produced by the
right associative application of the function returned by op
.parser
from zero up to count
times.parser
from min
to max
times (including min
and max
).&str
, String
, &[u8]
or Vec<u8>
) and parses it
using std::str::FromStr
. Errors if the output of parser
is not UTF-8 or if
FromStr::from_str
returns an error.look_ahead(p)
acts as p
but doesn’t consume input on success.p
zero or more times returning a collection with the values from p
.p
one or more times returning a collection with the values from p
.tokens
.parser
fails.
Never consumes any input.tokens
.parser
and outputs Some(value)
if it succeeds, None
if it fails without
consuming any input. Fails if parser
fails after having consumed some input.predicate
.predicate
. If predicate
returns Some
the parser succeeds
and returns the value inside the Option
. If predicate
returns None
the parser fails
without consuming any input.parser
zero or more time separated by separator
, returning a collection with the
values from p
.parser
one or more time separated by separator
, returning a collection with the
values from p
.parser
zero or more times separated and ended by separator
, returning a collection
with the values from p
.parser
one or more times separated and ended by separator
, returning a collection
with the values from p
.parser
from zero up to count
times skipping the output of parser
.parser
from min
to max
times (including min
and max
)
skipping the output of parser
.p
zero or more times ignoring the result.p
one or more times ignoring the result.c
.try(p)
behaves as p
except it acts as if the parser hadn’t consumed any input if p
fails
after consuming input.message
as an unexpected error.
Never consumes any input.message
as an unexpected error.
Never consumes any input.v
without consuming any input.Type Definitions
Result
type which has the consumed status flattened into the result.
Conversions to and from std::result::Result
can be done using result.into()
or
From::from(result)
Result
type used by parsers to indicate whether they were
successful or not.
O
is the type that is output on success.
I
is the specific stream type used in the parser.