Expand description
§Overview
Most parsing tasks can be boiled down to extracting meaning out of arbitrary bytes. Regardless of how this is done, you’ll have code that looks at a series of bytes and does something based on what was seen. This crate simplifies the task of repeatdly recognizing bytes in a byte slice.
The crate is made up of three parts:
- A
Patterntrait for types that are able to recognizes byte sequences - A list of common functions and combinators for composing
Patternstogether. - The
Bytesstruct; aCursor-like wrapper around some input that uses patterns to advance the position.
§Creating Patterns
Spaces in HTTP start lines:
The elements of an HTTP request line are usually separated by a single space. The spec is more permissive and allows for an arbitrary amount of tabs or whitespace. Here is a pattern than can be used to skip heterogenous spaces:
use bparse::{oneof, at_least};
at_least(1, oneof(b" \t"));JSON numbers:
Recognizing JSON numbers can get tricky.
The spec allows for numbers like 12, -398.42, and even 12.4e-3.
Here we incrementally build up a pattern called number that can recognizes all JSON number forms:
use bparse::{Pattern, oneof, range, at_least, optional};
let sign = optional(oneof(b"-+"));
let onenine = range(b'1', b'9');
let digit = "0".or(onenine);
let digits = at_least(1, digit);
let fraction = optional(".".then(digits));
let exponent = optional("E".then(sign).then(digits).or("e".then(sign).then(digits)));
let integer = onenine
.then(digits)
.or("-".then(onenine).then(digits))
.or("-".then(digit))
.or(digit);
let number = integer.then(fraction).then(exponent);§Using Patterns
If you have written parsers before, you have probably implemented a wrapper around your raw input
with methods such as peek, accept, next() etc…
We do this because it simplifies keeping track of our position and asserting things about the input.
The Bytes struct does exactly that.
Here is contribed example of parsing a Set-Cookie header value.
If you were actually doing this, the code would be a bit more structured (a state machine perhaps?), but you would still use Bytes in a similar manner.
use std::str::from_utf8;
use bparse::{Bytes, oneof, noneof, at_least};
let cookie = " id=b839d87df;Domain=foo.com; HttpOnly;";
let mut bytes = Bytes::from(cookie);
let mut is_http_only = false;
let mut domain = None;
let mut name = "";
let mut value = "";
let until_semicolon = at_least(1, noneof(b";"));
let until_eql = at_least(1, noneof(b"="));
let optional_ws = at_least(0, oneof(b"\t "));
loop {
if bytes.eof() {
break;
}
let _ = bytes.parse(optional_ws);
if bytes.parse("Domain=").is_some() {
domain = bytes.parse(until_semicolon).map(|b| from_utf8(b).unwrap());
let _ = bytes.parse(";");
continue;
}
if bytes.parse("HttpOnly;").is_some() {
is_http_only = true;
continue;
}
if let Some(cookie_name) = bytes.parse(until_eql) {
let _ = bytes.parse("=");
name = from_utf8(cookie_name).unwrap();
let Some(cookie_value) = bytes.parse(until_semicolon) else {
panic!("missing cookie value");
};
value = from_utf8(cookie_value).unwrap();
let _ = bytes.parse(";");
continue;
}
}
assert!(is_http_only);
assert_eq!(domain, Some("foo.com"));
assert_eq!(name, "id");
assert_eq!(value, "b839d87df");Structs§
- See
range() - See
bytes - A byte slice with a movable cursor.
- See
Pattern::or - See
Pattern::and - See
end - See
oneof - See
not - See
Pattern::then - See
byte
Traits§
- Expresses that the implementing type may be used to match part of a slice of bytes.
Functions§
- Returns a pattern that will match any ascii letter at the start of the input
- Returns a new pattern that as many repetitions as possible of the given
pattern, including 0. - Returns a new pattern that matches at least
nrepetitions ofpattern. - Returns a new pattern that matches at most
nrepetitions ofpattern. - Returns a new pattern that matches between
loandhirepetitions ofpattern. - Returns a pattern that will match any single byte in the input
- Returns a pattern that will match
sliceif it occurs at the start of the input. - Returns a new pattern that matches exactly
nrepetitions ofpattern. - Returns a pattern that will match any ascii digit at the start of the input
- Returns a pattern that matches if the input is empty.
- Returns a pattern that will match any hexadecimal character at the start of the input
- Inverse of
oneof. - Returns a new pattern that matches only if
patterndoes not match - Returns a pattern that will match any byte in
bytesat the start of the input - Returns a new pattern that matches 0 or 1 repetitions of
pattern - Returns a pattern that will match any byte in the closed interval
[lo, hi] - Returns a pattern that will match the string slice
sat the start of the input.