Crate binator

source ·
Expand description

§Binator

Binator is a parser combinator like nom or combine. It’s require nightly to use experimental feature try trait and trait alias.

§Example

The very same hex color example from nom but using binator:

use binator::{
  Parse,
  Parsed,
  base::{
    is,
    uint_radix,
    IntRadixParse,
    Radix,
  },
  context::Ignore,
  utils::Utils,
};

#[derive(Debug, PartialEq)]
pub struct Color {
  pub red: u8,
  pub green: u8,
  pub blue: u8,
}

fn hex_primary<Stream, Context>(stream: Stream) -> Parsed<u8, Stream, Context>
where
  (): IntRadixParse<Stream, Context, u8>,
{
  uint_radix(2, Radix::HEX).parse(stream)
}

fn hex_color<Stream, Context>(stream: Stream) -> Parsed<Color, Stream, Context>
where
  (): IntRadixParse<Stream, Context, u8>,
{
  (is(b'#'), hex_primary, hex_primary, hex_primary)
    .map(|(_, red, green, blue)| Color { red, green, blue })
    .parse(stream)
}

assert_eq!(
  hex_color::<_, Ignore>.parse("#2F14DF".as_bytes()),
  Parsed::Success {
    stream: "".as_bytes(),
    token: Color {
      red: 0x2F,
      green: 0x14,
      blue: 0xDF,
    }
  }
);

Bigger example, a little json parser here, or a network packet parser here.

§Influence

  • This project has been a lot influenced by nom. However, it’s very different, require nightly and is very experimental while nom is way more stable.
  • combine have also influenced this project but way less than nom.

§Difference with nom

binator use alias trait and try trait to provide a better experience, but this requires nightly.

nom can handle both octet and char, binator only take octet. Don’t run yet ! binator make the choice to include an utf8 combinator, this mean where in nom you need two versions of each combinator, one for character, one for octet, binator you just need one for octet, and you must use our utf8 combinator (or you can code yours) when you expect utf8 in your data. We do not want you to validate your data to be valid utf8 and then parse it. Also, for incomplete data is way better. Bonus, in theory this is faster.

Error in binator are way more flexible than in nom, you can create your own error, and there will be added to the pool of error of the big parser you are building. All error are flattened no matter where you create then, this mean your custom error is the same level as binator error, there is no difference between them. This is done with the work of generic that can make hard to work with binator. Nom choice to be more simple on that, limiting the customization of user error.

The core trait of binator is Streaming, the main operation of this trait is split_first, that will simply take one Item from your Stream, so 99% of time it’s one octet from your data. While nom have multiple trait you need to implement to be able to use a custom Stream, binator there is only one, and very simple.

§Limitation

Currently, Array are used as “or” branch, if the array is empty (so there is no parser) it’s make no sense cause Array parser need to return something so would need to have its own Error “empty array”, it shouldn’t be possible to use an empty array, but it is because we use const generic to impl Parse it’s possible. However, it’s VERY hard to write thus code, since compiler can’t infer anything from an empty array alone, a user would REALLY need to force it. This will be removed when we can do more with const generic and will NOT be considered a breaking change at any point.

§Performance

While not being the primary goal it’s still a goal, for now primary testing show it’s similar to nom. So if your goal is peak performance maybe binator is not for you, but if your goal is “fast enough” binator should be ok. Some benchmark test would be welcome, there is already a json parser crate for binator.

§License

This project choice the Zlib license because it’s almost like MIT, but it’s more flexible on the inclusion of licenses in binary also it’s include the share of modification. It’s also constraint on forking, this mean one must not upload copy of this on crates.io without clearly state it’s a fork and not the original.

§Contribution

Unless you explicitly state otherwise, any contribution intentionally submitted for inclusion in the work by you, shall be licensed as above (Zlib licence), without any additional terms or conditions. Big contributor will eventually be added to author list.

Binator Contributors

§Grammar

I’m clearly not an English native speaker, so I would accept PR that make documentation more clear, however, I don’t want small correction like “US vs UK” version, I don’t want PR that just remove space before “!” or “?”, because I’m French and I like it that way. I want PR that respect the original author that write the sentence, but if you add new sentence use your own style. In summary, I will accept any PR that add clarity, but not grammar zealot PR.

§How binator works

Binator define trait that structure your parser. For something to be considered as a Parser by binator it must implement Parse trait. This trait is used every time you use a Parser. This trait only have one method Parse::parse, it takes a Stream as parameter. A Stream can be anything that implement Streaming, for example binator implement it for &'a [u8]. Most of the time a Parser will use indirectly Streaming::split_first to get a Streaming::Item from the Stream. When a Parser is done with the input it will return Parsed. It’s an enumeration that implement core::ops::Try so you can use ? on a Parser, this enumeration is used to represent the result of a Parser. A Parser can return Parsed::Success, Parsed::Failure or Parsed::Error. Success contains a Token, that what the Parser produced from the Stream, and a Stream that contains the input not used by the Parser. Failure means the parser didn’t recognize the input, it’s not a fatal error at all, it’s perfectly normal for a combinator parser to return Failure. And then Error is a fatal Error, like an Error produced by the Stream or by a Parser. Both Failure and Error contains a Context. Context is something that implement Contexting, it’s the way binator accumulate Failure, Context is like a container of Failure. If a Parser need to return a context, it can use Contexting::new that require an Atom. Atom can be anything a Parser want, for example, core define crate::base::FloatAtom. Contexting require that the Context implement core::ops::Add and core::ops::BitOr this mean if you already called another Parser that return a Context you can add you own Atom and build a more precise Context for the final user. Most combinator of binator do this for you already. With all of this you know mostly all about how binator works.

§Terminology

§Stream

A structure that will produce Item when asked

§Parser

Something that will check that Item produced by Stream are correct

§Context

A structure that will manage Failure and Error generate by Parser

§Token

Represent what a Parser return when Success

§Atom

A structure that contain information about the Failure or Error from a Parser

§Element

Something, generally an enumeration, that will contain all different kind of Atom

§Parsed

Enumeration that indicate result of a Parser

§Parse

A trait that all Parser implement, used to use a Parser

§Failure

Indicate a Parser didn’t validate the input

§Success

Indicate a Parser validate the input

§Error

Indicate a Parser encounter an irrecoverable error.

§Streaming

A trait that Stream implement to make their job

§Item

Item produced by a Stream, generally just an u8

§Span

A delimited part of the Stream

§Contexting

A trait that all Context will implement, used to accumulate failure of Parser

Modules§

  • Contains basic combinator that you start from to make parser, for example, you want the ascii char ‘i’, you start with is(b'i'). Or number like “42” (uint_radix), or binary form number (u16_be)
  • Contains structure that will hold the failure in your parser, you can ignore them, use a stack or even have a full tree of all failures that your parsers generated.
  • Contains structure that can be used as Stream.
  • Contains combinator that you can use to control loop, valid data and more. Like you want as many i as possible is(b'i').fold_bounds(.., || (), Acc::acc). When you get used to it this fold_bounds do everything you need.

Structs§

  • This represent a stand alone Success from Parsed result.

Enums§

  • Core context used to implement context for basic type like u8
  • Parsed represent the result of a parse().
  • This is like Parsed but Succeed doesn’t contain stream
  • Represent split Result

Traits§

  • Contexting is a trait used to report failure and error. This idea is too have a tree of context that will help final user to understand the error. It’s can also help for debugging purpose.
  • Parse is a trait that all parsers should implement. There is a blanked implementation for type that implement FnMut that match signature of parse(). This mean you can quickly use a function to implement a Parser.
  • This is an utily trait
  • This trait must be implement by all struct that want to be a stream for binator.