Crate yap_streaming
source ·Expand description
This crate builds on the interfaces from yap
to allow simple parsing of streams.
Why
There already exist many crates that intend to help with parsing.
Of that list nom
, winnow
, chumsky
, combine
support parsing streams of values.
nom
:
- No obvious way to signal the end of a stream to a parser.
- The user of the library has to implement a streaming parser noticeably differently from a non-streaming parser.
- Parsing occurs on chunks. Parsing dynamically sized chunks can require re-parsing the chunk from scratch and redoing work.
winnow
:
- Parsing occurs on chunks. Parsing dynamically sized chunks can require re-parsing the chunk from scratch and redoing work.
chumsky
is not designed for speed.
combine
is complicated.
This crate allows using an already written yap
parser by simply changing the initial tokens declaration.
use std::{
fs::File,
io::{self, BufReader, Read},
};
use yap_streaming::{
// Allows you to use `.into_tokens()` on strings and slices,
// to get an instance of the above:
IntoTokens,
// Allows you to get an instance of `Tokens` that supports streams:
StrStreamTokens,
// This trait has all of the parsing methods on it:
Tokens,
};
// Write parser
// =========================================
#[derive(PartialEq, Debug)]
enum Op {
Plus,
Minus,
Multiply,
}
#[derive(PartialEq, Debug)]
enum OpOrDigit {
Op(Op),
Digit(u32),
}
// The `Tokens` trait builds on `Iterator`, so we get a `next` method.
fn parse_op(t: &mut impl Tokens<Item = char>) -> Option<Op> {
let loc = t.location();
match t.next()? {
'-' => Some(Op::Minus),
'+' => Some(Op::Plus),
'x' => Some(Op::Multiply),
_ => {
t.set_location(loc);
None
}
}
}
// We also get other useful functions..
fn parse_digits(t: &mut impl Tokens<Item = char>) -> Option<u32> {
t.take_while(|c| c.is_digit(10)).parse::<u32, String>().ok()
}
fn parse_all(t: &mut impl Tokens<Item = char>) -> impl Tokens<Item = OpOrDigit> + '_ {
// As well as combinator functions like `sep_by_all` and `surrounded_by`..
t.sep_by_all(
|t| {
t.surrounded_by(
|t| parse_digits(t).map(OpOrDigit::Digit),
|t| {
t.skip_while(|c| c.is_ascii_whitespace());
},
)
},
|t| parse_op(t).map(OpOrDigit::Op),
)
}
// Now we've parsed our input into OpOrDigits, let's calculate the result..
fn eval(t: &mut impl Tokens<Item = char>) -> u32 {
let op_or_digit = parse_all(t);
let mut current_op = Op::Plus;
let mut current_digit = 0;
for d in op_or_digit.into_iter() {
match d {
OpOrDigit::Op(op) => current_op = op,
OpOrDigit::Digit(n) => match current_op {
Op::Plus => current_digit += n,
Op::Minus => current_digit -= n,
Op::Multiply => current_digit *= n,
},
}
}
current_digit
}
// Use parser
// =========================================
// Get our input and convert into something implementing `Tokens`
let mut tokens = "10 + 2 x 12-4,foobar".into_tokens();
// Parse
assert_eq!(eval(&mut tokens), 140);
// Instead of parsing an in-memory buffer we can use `yap_streaming` to parse a stream.
// While we could [`std::io::Read::read_to_end()`] here, what if the file was too large
// to fit in memory? What if we were parsing from a network socket?
let mut io_err = None;
let file_chars = BufReader::new(File::open("examples/opOrDigit.txt").expect("open file"))
.bytes()
.map_while(|x| {
match x {
Ok(x) => {
if x.is_ascii() {
Some(x as char)
} else {
io_err = Some(io::ErrorKind::InvalidData.into());
// Don't parse any further if non-ascii input.
// This simple example parser only makes sense with ascii values.
None
}
}
Err(e) => {
io_err = Some(e);
// Don't parse any further if io error.
// Alternatively could panic, retry the byte,
// or include as an error variant and parse Result<char, ParseError> instead.
None
}
}
});
// Convert to something implementing `Tokens`.
// If parsing a stream not of `char` use [`yap_streaming::StreamTokens`] instead.
let mut tokens = StrStreamTokens::new(file_chars);
// Parse
assert_eq!(eval(&mut tokens), 140);
// Check that parse encountered no io errors.
assert!(io_err.is_none());
Structs
StrStreamTokens
is likeStreamTokens
but optimized for more efficient usage ofTokens::parse()
and related methods when wrappingIterator<Item = char>
.- Enables parsing a stream of values from a
Fuse
d iterator that can’t itself be cloned. In order to be able to rewind the iterator it must save values since the oldest notDrop
edStreamTokensLocation
intoBuf
. - This implements
TokenLocation
and stores the location. It also marks theIterator::Item
s since it was created to be stored for when the correspondingStreamTokens
is reset.
Traits
- A trait that is implemented by anything which can be converted into an object implementing the
Tokens
trait. - Calling
Tokens::location()
returns an object that implements this trait. - The tokens trait is an extension of the
Iterator
trait, and adds a bunch of useful methods for parsing tokens from the underlying iterable type.