Crate flussab[][src]

The Flussab crate is a collection of utlities for writing parsers.

Currently Flussab aims to provide just enough to write parsers with a certain combination of constraints for which Flussab’s author did not find a suitable existing solution. It is not intended as a replacement for any such existing solution targeting a different set of constraints.

The target use-case are efficient, continuously streaming, interactive, error reporting, non-backtracking, recursive-descent parsers for text-based, binary and mixed formats:

  • Efficient: The straight forward implementation of a parser should be fast, and the library should not get in the way when performing more complex optimizations that are possible in hand-rolled parsers (e.g. using data parallelism to scan multiple input bytes at once).

    The efficiency is realized by a) using an input stream with a mutable cursor instead of threading the current position through return values and b) (when using ByteReader) deferring IO error checks. This together greatly simplifies the control and data flow of the parsers, which makes it easier for the compiler to optimize the code and often results in overall faster code.

    Note that deferred error checking does not mean that IO errors are ignored or handled imprecisely. Instead when an IO error occurs the parsing logic will exhaust all current choices which forces a parse error. At the point where a parse error is generated, outside of the hot path, we can then check whether an IO error that occured caused this.

  • Continuously streaming: It should be possible to parse and process data which does not fit into memory, i.e. it is possible to read more data as it is parsed and processed, without buffering everything read so far. In particular it is not sufficient to require restarting the parser when the input was incomplete nor to require external code that splits a potentially infinite input stream into individual chunks.

  • Interactive: The parser should be usable in a REPL with multi-line input without negatively affecting efficiency when parsing data in bulk. This means it must be able to handle input data reads that return less bytes than requested, so that line-buffering is usable.

  • Error reporting: When parsing fails, a parser must be able to generate useful error messages. The goal here isn’t to replicate the excellent error reporting of e.g. rustc, but to offer enough information that e.g. a user that generated a gigabyte sized input file using a bunch of println!s can quickly find the typo they made.

  • Non-backtracking (a.k.a. predictive): Continuous streaming already requires some limits to back tracking. Avoiding backtracking completely also avoids many ways to accidentally making a recursive-descent parser become very slow for some inputs. This does mean that some form of look-ahead is required to parse most formats. This can be realized either by using the provided ByteReader which has dynamic lookahead, and/or by using a tokenizer for formats where no look-ahead is required after tokenization.

    (Note that Flussab doesn’t stop you from handling backtracking yourself, it just does not provide any help for that.)

  • Recursive-descent: Parsers are written as simple Rust functions that take a mutable reference to the input stream and either return the parsed value after consuming some input or alternatively indicate failure. This also enables writing parser-combinators as higher level functions, although currently Flussab provides only a minimal set of combinators.

    The provided infrastructure for combining parsers is entirely agnostic of how the input stream is handled, as long as it itself keeps track of the input position. This crate provides ByteReader as one choice, but a Peekable iterator of tokens would work as well.

Using Flussab

Parsers are written very much like manual recursive descent parsers. If our format has a something we would have a function like this:

fn something(input: &mut ByteReader) -> Parsed<Something, ParseError> {
    // Check whether `input` contains a _something_ at the current position
    if !input_as_expected {
        return Fallthrough;
    }
    // Read _something_ from `input` and advance the input position.
    if parsing_failed_after_advancing_the_input {
        return Res(Err(some_error));
    }

    Res(Ok(something))
}

Here Parsed and ByteReader are provided by this crate and are good entry points for the documentation. A Parsed value is just a wrapper for a Result which adds Fallthrough as a third option besides Ok and Err that indicates that the input does not match and that the parser did not consume any input.

Instead of manually returning Fallthrough or Res(Err(..)), often the provided methods of Parsed and Result are used to combine smaller parsers into larger ones.

Re-exports

pub use Parsed::*;

Modules

text

Utilities for parsing text based formats using a ByteReader.

Structs

ByteReader

A buffered reader optimized for efficient parsing.

ByteWriter

A buffered writer with deferred error checking.

Enums

Parsed

Return type for recursive descent parser that can fall through to another parser on unexpected input.

Traits

ResultExt

This trait implements additional methods for Result values, which are useful for parsing.

Type Definitions

Result

Return type for recursive descent parsers that fail irrecoverably on unexpected input.