Crate bparse

Source
Expand description

§Overview

Most parsing tasks can be boiled down to extracting meaning out of arbitrary bytes. Regardless of how this is done, you’ll have code that looks at a series of bytes and does something based on what was seen. This crate simplifies the task of repeatdly recognizing bytes in a byte slice.

The crate is made up of three parts:

  1. A Pattern trait for types that are able to recognizes byte sequences
  2. A list of common functions and combinators for composing Patterns together.
  3. The Bytes struct; a Cursor-like wrapper around some input that uses patterns to advance the position.

§Creating Patterns

Spaces in HTTP start lines:

The elements of an HTTP request line are usually separated by a single space. The spec is more permissive and allows for an arbitrary amount of tabs or whitespace. Here is a pattern than can be used to skip heterogenous spaces:

use bparse::{oneof, at_least};
at_least(1, oneof(b" \t"));

JSON numbers:

Recognizing JSON numbers can get tricky. The spec allows for numbers like 12, -398.42, and even 12.4e-3. Here we incrementally build up a pattern called number that can recognizes all JSON number forms:

use bparse::{Pattern, oneof, range, at_least, optional};

let sign = optional(oneof(b"-+"));
let onenine = range(b'1', b'9');
let digit = "0".or(onenine);
let digits = at_least(1, digit);
let fraction = optional(".".then(digits));
let exponent = optional("E".then(sign).then(digits).or("e".then(sign).then(digits)));
let integer = onenine
    .then(digits)
    .or("-".then(onenine).then(digits))
    .or("-".then(digit))
    .or(digit);
let number = integer.then(fraction).then(exponent);

§Using Patterns

If you have written parsers before, you have probably implemented a wrapper around your raw input with methods such as peek, accept, next() etc… We do this because it simplifies keeping track of our position and asserting things about the input. The Bytes struct does exactly that.

Here is contribed example of parsing a Set-Cookie header value. If you were actually doing this, the code would be a bit more structured (a state machine perhaps?), but you would still use Bytes in a similar manner.

use std::str::from_utf8;
use bparse::{Bytes, oneof, noneof, at_least};

let cookie = " id=b839d87df;Domain=foo.com;   HttpOnly;";

let mut bytes = Bytes::from(cookie);

let mut is_http_only = false;
let mut domain = None;
let mut name = "";
let mut value = "";

let until_semicolon = at_least(1, noneof(b";"));
let until_eql = at_least(1, noneof(b"="));
let optional_ws = at_least(0, oneof(b"\t "));

loop {
    if bytes.eof() {
        break;
    }

    let _ = bytes.parse(optional_ws);

    if bytes.parse("Domain=").is_some() {
        domain = bytes.parse(until_semicolon).map(|b| from_utf8(b).unwrap());
        let _ = bytes.parse(";");
        continue;
    }

    if bytes.parse("HttpOnly;").is_some() {
        is_http_only = true;
        continue;
    }

    if let Some(cookie_name) = bytes.parse(until_eql) {
        let _ = bytes.parse("=");
        name = from_utf8(cookie_name).unwrap();
        let Some(cookie_value) = bytes.parse(until_semicolon) else {
            panic!("missing cookie value");
        };
        value = from_utf8(cookie_value).unwrap();
        let _ = bytes.parse(";");
        continue;
    }
}

assert!(is_http_only);
assert_eq!(domain, Some("foo.com"));
assert_eq!(name, "id");
assert_eq!(value, "b839d87df");

Structs§

Traits§

  • Expresses that the implementing type may be used to match part of a slice of bytes.

Functions§

  • Returns a pattern that will match any ascii letter at the start of the input
  • Returns a new pattern that as many repetitions as possible of the given pattern, including 0.
  • Returns a new pattern that matches at least n repetitions of pattern.
  • Returns a new pattern that matches at most n repetitions of pattern.
  • Returns a new pattern that matches between lo and hi repetitions of pattern.
  • Returns a pattern that will match any single byte in the input
  • Returns a pattern that will match slice if it occurs at the start of the input.
  • Returns a new pattern that matches exactly n repetitions of pattern.
  • Returns a pattern that will match any ascii digit at the start of the input
  • Returns a pattern that matches if the input is empty.
  • Returns a pattern that will match any hexadecimal character at the start of the input
  • Inverse of oneof.
  • Returns a new pattern that matches only if pattern does not match
  • Returns a pattern that will match any byte in bytes at the start of the input
  • Returns a new pattern that matches 0 or 1 repetitions of pattern
  • Returns a pattern that will match any byte in the closed interval [lo, hi]
  • Returns a pattern that will match the string slice s at the start of the input.