Crate sipp

source ·
Expand description

Simple parser package.

This package provides a ByteBuffer which wraps around a byte stream, and decoders such as Utf8Decoder which can wrap around a ByteBuffer, and a Parser which can be wrapped around a decoder. The end result is a Parser which lets you peek at characters to see what’s coming next, and then read expected characters. All fallible methods return a Result and no method in this package should ever panic.

§Examples

§Acornsoft Logo parser

Suppose you want to parse a (simplified) set of Acornsoft Logo instructions, such that you only want to accept the “FORWARD”, “LEFT”, and “RIGHT” instructions, and each instruction must come on a line of its own (separated by a newline character), and each instruction is followed by any number of space characters, which is then followed by a numeric amount. Example input might look like this:

FORWARD 10
RIGHT 45
FORWARD 20
RIGHT 10
FORWARD 5
LEFT 3

You could use sipp to parse these instructions using code like this:

let input = "FORWARD 10\nRIGHT 45\nFORWARD 20\nRIGHT 10\nFORWARD 5\nLEFT 3";
// We know that Rust strings are UTF-8 encoded, so wrap the input bytes with a Utf8Decoder.
let decoder = Utf8Decoder::wrap(input.as_bytes());
// Now wrap the decoder with a Parser to give us useful methods for reading through the input.
let mut parser = Parser::wrap(decoder);
// Keep reading while there is still input available.
while parser.has_more()? {
    // Read the command by reading everything up to (but not including) the next space.
    let command = parser.read_up_to(' ')?;
    // Skip past the (one or more) space character.
    parser.skip_while(|c| c == ' ')?;
    // Read until the next newline (or the end of input, whichever comes first).
    let number = parser.read_up_to('\n')?;
    // Now either there is no further input, or the next character must be a newline.
    // If the next character is a newline, skip past it.
    parser.accept('\n')?;
}

§Comma-separated list parser

Given a hardcoded string which represents a comma-separated list, you could use this package to parse it like so:

let input = "first value,second value,third,fourth,fifth,etc";
let buffer = ByteBuffer::wrap(input.as_bytes());
let decoder = Utf8Decoder::wrap_buffer(buffer);
let mut parser = Parser::wrap(decoder);
let mut value_list = Vec::new();
// Keep reading while input is available.
while parser.has_more()? {
    // Read up to the next comma, or until the end of input (whichever comes first).
    // If there is nothing between two commas then just insert an empty string.
    let value = parser.read_up_to(',')?.unwrap_or("".to_string());
    value_list.push(value);
    // Now either there is no further input, or the next character must be a comma.
    // If the next character is a comma, skip past it.
    parser.accept(',')?;
}

assert_eq!(value_list.iter().map(|s| s.to_string()).collect::<Vec<String>>(),
vec!["first value", "second value", "third", "fourth", "fifth", "etc"]);

§Release notes

§0.1.0

Initial release.

§0.1.1

  • Added has_more method to Parser.
  • Adjusted rustdoc based on advice found in Rust API Guidelines, primarily in separating out error descriptions from the lede and moving them into a dedicated “Errors” section within each method’s rustdoc comment.

§0.2.0

Altered return type of public method Parser.read_up_to(char) so that it now returns None instead of an empty String. Adjusted examples and unit tests accordingly.

Modules§

  • A buffered wrapper around a Read type, providing read and peek methods.
  • A decoder of bytes into characters. The trait ByteStreamCharDecoder can be implemented for different character encodings, and implementations are included for UTF-8, and UTF-16 (in big-endian and little-endian byte order).
  • A parser which can wrap a character stream and provide methods to read and peek at the stream.