Expand description
Chomp is a fast monadic-style parser combinator library for the Rust programming language. It was written as the culmination of the experiments detailed in these blog posts:
For its current capabilities, you will find that Chomp performs consistently
as well, if not better, than optimized C parsers, while being vastly more
expressive. For an example that builds a performant HTTP parser out of
smaller parsers, see http_parser.rs.
§Example
use chomp1::prelude::*;
#[derive(Debug, Eq, PartialEq)]
struct Name<B: Buffer> {
first: B,
last: B,
}
fn name<I: U8Input>(i: I) -> SimpleResult<I, Name<I::Buffer>> {
parse! {i;
let first = take_while1(|c| c != b' ');
token(b' '); // skipping this char
let last = take_while1(|c| c != b'\n');
ret Name{
first: first,
last: last,
}
}
}
assert_eq!(
parse_only(name, "Martin Wernstål\n".as_bytes()),
Ok(Name {
first: &b"Martin"[..],
last: "Wernstål".as_bytes()
})
);§Usage
Chomp’s functionality is split between three modules:
parserscontains the basic parsers used to parse streams of input.combinatorscontains functions which take parsers and return new ones.primitivescontains the building blocks used to make new parsers. This is advanced usage and is far more involved than using the pre-existing parsers, but is sometimes unavoidable.
A parser is, at its simplest, a function that takes a slice of input and
returns a ParserResult<I, T, E>, where I, T, and E are the input,
output, and error types, respectively. Parsers are usually parameterized
over values or other parsers as well, so these appear as extra arguments in
the parsing function. As an example, here is the signature of the
token parser, which matches a particular input.
fn token<I: Input>(i: I, t: I::Token) -> ParseResult<I, I::Token, Error<I::Token>> { ... }Notice that the first argument is an Input<I>, and the second argument is
some I. Input<I> is just a datatype over the current state of the parser
and a slice of input I, and prevents the parser writer from accidentally
mutating the state of the parser. Later, when we introduce the parse!
macro, we will see that using a parser in this macro just means
supplying all of the arguments but the input, as so:
token(b'T');Note that you cannot do this outside of the parse! macro. SimpleResult<I, T> is a convenience type alias over ParseResult<I, T, Error<u8>>, and
Error<I> is just a convenient “default” error type that will be sufficient
for most uses. For more sophisticated usage, one can always write a custom
error type.
A very useful parser is the satisfy parser:
fn satisfy<I: Input, F>(mut i: I, f: F) -> ParseResult<I, I::Token, Error<I::Token>>
where F: FnOnce(I::Token) -> bool { ... }Besides the input state, satisfy’s only parameter is a predicate function
and will succeed only if the next piece of input satisfies the supplied
predicate. Here’s an example that might be used in the parse! macro:
satisfy(|c| match c {
b'c' | b'h' | b'a' | b'r' => true,
_ => false,
})This parser will only succeed if the character is one of the characters in “char”.
Lastly, here is the parser combinator count, which will attempt to run a
parser a number of times on its input.
pub fn count<I: Input, T, E, F, U>(i: I, num: usize, p: F) -> ParseResult<I, T, E>
where F: FnMut(I) -> ParseResult<I, U, E>,
T: FromIterator<U> { ... }Using parsers is almost entirely done using the parse! macro, which
enables us to do three distinct things:
- Sequence parsers over the remaining input
- Store intermediate results into datatypes
- Return a datatype at the end, which may be the result of any arbitrary computation over the intermediate results.
In other words, just as a normal Rust function usually looks something like this:
fn f() -> (u8, u8, u8) {
let a = read_number();
let b = read_number();
launch_missiles();
return (a, b, a + b);
}A Chomp parser with a similar structure looks like this:
fn f<I: U8Input>(i: I) -> SimpleResult<I, (u8, u8, u8)> {
parse! {i;
let a = digit();
let b = digit();
string(b"missiles");
ret (a, b, a + b)
}
}
fn digit<I: U8Input>(i: I) -> SimpleResult<I, u8> {
satisfy(i, |c| b'0' <= c && c <= b'9').map(|c| c - b'0')
}Readers familiar with Haskell or F# will recognize this as a “monadic computation” or “computation expression”.
You use the parse! macro as follows:
- Write the input parameter first, with a semicolon.
- Write any number of valid parser actions or identifier bindings, where:
- a parser action takes the form
parser(params*), with the input parameter omitted. - an identifier binding takes the form
let identifer = parser(params*);, with the input parameter omitted.
- a parser action takes the form
- Write the final line of the macro, which must always be either a parser
action, or a return statement which takes the form
ret expression. The type ofexpressionbecomes the return type of the entire parser, should it succeed.
The entire grammar for the macro is listed elsewhere in this documentation.
§Features
-
backtrace: disabled (default). This feature enables backtraces for parse-errors, either by callingError::traceor by printing it usingfmt::Debug.This incurs a performance-hit every time a
chomp1::parsersparser fails since a backtrace must be collected.In the
devandtestprofiles backtraces will always be enabled. This does not incur any cost when built using thereleaseprofile unless thebacktracefeature is enabled. -
noop_error: disabled (default). The built-inchomp1::parsers::Errortype is zero-sized and carry no error-information. This increases performance somewhat. -
std: enabled (default). Chomp includes all features which rely on Rust’sstdlibrary. If this is diabled Chomp will use theno_stdfeature, only using Rust’scorelibrary.Items excluded when
stdis disabled:ascii::floatsupport fortype::Bufferimplementations other than&[u8].buffermodule.combinators::choicecombinator.parsers::Errorno longer implements thestd::error::Errortrait.types::Buffer::to_vectypes::Buffer::into_vec
Modules§
- ascii
- Utilities and parsers for dealing with ASCII data in
u8format. - buffer
- Utilities for parsing streams of data.
- combinators
- Basic combinators.
- parsers
- Basic parsers.
- prelude
- Basic prelude.
- primitives
- Module used to construct fundamental parsers and combinators.
- types
- Types which facillitates the chaining of parsers and their results.
Macros§
- parse
- Macro emulating
do-notation for the parser monad, automatically threading the linear type. - parser
- Macro wrapping an invocation to
parse!in a closure, useful for creating parsers inline.
Functions§
- parse_
only - Runs the given parser on the supplied finite input.
- parse_
only_ str - Runs the given parser on the supplied string.
- run_
parser - Runs the supplied parser over the input.