Expand description
§Overview
Most parsing tasks can be boiled down to extracting meaning out of arbitrary bytes. Regardless of how this is done, you’ll have code that looks at a series of bytes and does something based on what was seen. This crate simplifies the task of repeatdly recognizing bytes in a byte slice.
The crate is made up of three parts:
- A
Pattern
trait for types that are able to recognizes byte sequences - A list of common functions and combinators for composing
Patterns
together. - The
Bytes
struct; aCursor
-like wrapper around some input that uses patterns to advance the position.
§Creating Pattern
s
Spaces in HTTP start lines:
The elements of an HTTP request line are usually separated by a single space. The spec is more permissive and allows for an arbitrary amount of tabs or whitespace. Here is a pattern than can be used to skip heterogenous spaces:
use bparse::{oneof, at_least};
at_least(1, oneof(b" \t"));
JSON numbers:
Recognizing JSON numbers can get tricky.
The spec allows for numbers like 12
, -398.42
, and even 12.4e-3
.
Here we incrementally build up a pattern called number
that can recognizes all JSON number forms:
use bparse::{Pattern, oneof, range, at_least, optional};
let sign = optional(oneof(b"-+"));
let onenine = range(b'1', b'9');
let digit = "0".or(onenine);
let digits = at_least(1, digit);
let fraction = optional(".".then(digits));
let exponent = optional("E".then(sign).then(digits).or("e".then(sign).then(digits)));
let integer = onenine
.then(digits)
.or("-".then(onenine).then(digits))
.or("-".then(digit))
.or(digit);
let number = integer.then(fraction).then(exponent);
§Using Pattern
s
If you have written parsers before, you have probably implemented a wrapper around your raw input
with methods such as peek
, accept
, next()
etc…
We do this because it simplifies keeping track of our position and asserting things about the input.
The Bytes
struct does exactly that.
Here is contribed example of parsing a Set-Cookie
header value.
If you were actually doing this, the code would be a bit more structured (a state machine perhaps?), but you would still use Bytes
in a similar manner.
use std::str::from_utf8;
use bparse::{Bytes, oneof, noneof, at_least};
let cookie = " id=b839d87df;Domain=foo.com; HttpOnly;";
let mut bytes = Bytes::from(cookie);
let mut is_http_only = false;
let mut domain = None;
let mut name = "";
let mut value = "";
let until_semicolon = at_least(1, noneof(b";"));
let until_eql = at_least(1, noneof(b"="));
let optional_ws = at_least(0, oneof(b"\t "));
loop {
if bytes.eof() {
break;
}
let _ = bytes.parse(optional_ws);
if bytes.parse("Domain=").is_some() {
domain = bytes.parse(until_semicolon).map(|b| from_utf8(b).unwrap());
let _ = bytes.parse(";");
continue;
}
if bytes.parse("HttpOnly;").is_some() {
is_http_only = true;
continue;
}
if let Some(cookie_name) = bytes.parse(until_eql) {
let _ = bytes.parse("=");
name = from_utf8(cookie_name).unwrap();
let Some(cookie_value) = bytes.parse(until_semicolon) else {
panic!("missing cookie value");
};
value = from_utf8(cookie_value).unwrap();
let _ = bytes.parse(";");
continue;
}
}
assert!(is_http_only);
assert_eq!(domain, Some("foo.com"));
assert_eq!(name, "id");
assert_eq!(value, "b839d87df");
Structs§
- See
range()
- See
bytes
- A byte slice with a movable cursor.
- See
Pattern::or
- See
Pattern::and
- See
end
- See
oneof
- See
not
- See
Pattern::then
- See
byte
Traits§
- Expresses that the implementing type may be used to match part of a slice of bytes.
Functions§
- Returns a pattern that will match any ascii letter at the start of the input
- Returns a new pattern that as many repetitions as possible of the given
pattern
, including 0. - Returns a new pattern that matches at least
n
repetitions ofpattern
. - Returns a new pattern that matches at most
n
repetitions ofpattern
. - Returns a new pattern that matches between
lo
andhi
repetitions ofpattern
. - Returns a pattern that will match any single byte in the input
- Returns a pattern that will match
slice
if it occurs at the start of the input. - Returns a new pattern that matches exactly
n
repetitions ofpattern
. - Returns a pattern that will match any ascii digit at the start of the input
- Returns a pattern that matches if the input is empty.
- Returns a pattern that will match any hexadecimal character at the start of the input
- Inverse of
oneof
. - Returns a new pattern that matches only if
pattern
does not match - Returns a pattern that will match any byte in
bytes
at the start of the input - Returns a new pattern that matches 0 or 1 repetitions of
pattern
- Returns a pattern that will match any byte in the closed interval
[lo, hi]
- Returns a pattern that will match the string slice
s
at the start of the input.