unstable-doc only.Expand description
Chapter 2: Tokens and Tags
The simplest useful parser you can write is one which matches tokens.
Tokens
Stream provides some core operations to help with parsing. For example, to process a
single token, you can do:
use winnow::stream::Stream;
use winnow::error::ParserError;
use winnow::error::ErrorKind;
use winnow::error::ErrMode;
fn parse_prefix(input: &mut &str) -> PResult<char> {
let c = input.next_token().ok_or_else(|| {
ErrMode::from_error_kind(input, ErrorKind::Token)
})?;
if c != '0' {
return Err(ErrMode::from_error_kind(input, ErrorKind::Verify));
}
Ok(c)
}
fn main() {
let mut input = "0x1a2b Hello";
let output = parse_prefix.parse_next(&mut input).unwrap();
assert_eq!(input, "x1a2b Hello");
assert_eq!(output, '0');
assert!(parse_prefix.parse_next(&mut "d").is_err());
}any and Parser::verify are Parser building blocks on top of Stream:
use winnow::Parser;
use winnow::token::any;
fn parse_prefix(input: &mut &str) -> PResult<char> {
any.verify(|c| *c == '0').parse_next(input)
}Matching a single token literal is common enough that Parser is implemented for
char.
use winnow::Parser;
fn parse_prefix(input: &mut &str) -> PResult<char> {
'0'.parse_next(input)
}Tags
Stream also supports processing slices of tokens:
use winnow::stream::Stream;
use winnow::error::ParserError;
use winnow::error::ErrorKind;
use winnow::error::ErrMode;
fn parse_prefix<'s>(input: &mut &'s str) -> PResult<&'s str> {
let expected = "0x";
if input.len() < expected.len() {
return Err(ErrMode::from_error_kind(input, ErrorKind::Slice));
}
let actual = input.next_slice(expected.len());
if actual != expected {
return Err(ErrMode::from_error_kind(input, ErrorKind::Verify));
}
Ok(actual)
}
fn main() {
let mut input = "0x1a2b Hello";
let output = parse_prefix.parse_next(&mut input).unwrap();
assert_eq!(input, "1a2b Hello");
assert_eq!(output, "0x");
assert!(parse_prefix.parse_next(&mut "0o123").is_err());
}Again, matching a literal is common enough that Parser is implemented for &str:
use winnow::Parser;
fn parse_prefix<'s>(input: &mut &'s str) -> PResult<&'s str> {
"0x".parse_next(input)
}In winnow, we call this type of parser a tag. See token for additional individual
and token-slice parsers.
Character Classes
Selecting a single char or a tag is fairly limited. Sometimes, you will want to select one of several
chars of a specific class, like digits. For this, we use the one_of parser:
use winnow::token::one_of;
fn parse_digits(input: &mut &str) -> PResult<char> {
one_of(('0'..='9', 'a'..='f', 'A'..='F')).parse_next(input)
}
fn main() {
let mut input = "1a2b Hello";
let output = parse_digits.parse_next(&mut input).unwrap();
assert_eq!(input, "a2b Hello");
assert_eq!(output, '1');
assert!(parse_digits.parse_next(&mut "Z").is_err());
}Aside:
one_ofmight look straightforward, a function returning a value that implementsParser. Let’s look at it more closely as its used above (resolving all generic parameters):pub fn one_of<'i>( list: &'static [char] ) -> impl Parser<&'i str, char, InputError<&'i str>> { // ... }If you have not programmed in a language where functions are values, the type signature of the
one_offunction might be a surprise. The functionone_ofreturns a function. The function it returns is aParser, taking a&strand returning anPResult. This is a common pattern in winnow for configurable or stateful parsers.
Some of character classes are common enough that a named parser is provided, like with:
line_ending: Recognizes an end of line (both\nand\r\n)newline: Matches a newline character\ntab: Matches a tab character\t
You can then capture sequences of these characters with parsers like take_while.
use winnow::token::take_while;
fn parse_digits<'s>(input: &mut &'s str) -> PResult<&'s str> {
take_while(1.., ('0'..='9', 'a'..='f', 'A'..='F')).parse_next(input)
}
fn main() {
let mut input = "1a2b Hello";
let output = parse_digits.parse_next(&mut input).unwrap();
assert_eq!(input, " Hello");
assert_eq!(output, "1a2b");
assert!(parse_digits.parse_next(&mut "Z").is_err());
}We could simplify this further by using one of the built-in character classes, hex_digit1:
use winnow::ascii::hex_digit1;
fn parse_digits<'s>(input: &mut &'s str) -> PResult<&'s str> {
hex_digit1.parse_next(input)
}
fn main() {
let mut input = "1a2b Hello";
let output = parse_digits.parse_next(&mut input).unwrap();
assert_eq!(input, " Hello");
assert_eq!(output, "1a2b");
assert!(parse_digits.parse_next(&mut "Z").is_err());
}See ascii for more text-based parsers.