unstable-doc only.Expand description
§Chapter 2: Tokens and Tags
The simplest useful parser you can write is one which matches tokens.
In our case, tokens are char.
§Tokens
Stream provides some core operations to help with parsing. For example, to process a
single token, you can do:
use winnow::stream::Stream;
use winnow::error::ParserError;
fn parse_prefix(input: &mut &str) -> Result<char> {
let c = input.next_token().ok_or_else(|| {
ParserError::from_input(input)
})?;
if c != '0' {
return Err(ParserError::from_input(input));
}
Ok(c)
}
fn main() {
let mut input = "0x1a2b Hello";
let output = parse_prefix.parse_next(&mut input).unwrap();
assert_eq!(input, "x1a2b Hello");
assert_eq!(output, '0');
assert!(parse_prefix.parse_next(&mut "d").is_err());
}This extraction of a token is encapsulated in the any parser:
use winnow::Parser;
use winnow::token::any;
fn parse_prefix(input: &mut &str) -> Result<char> {
let c = any
.parse_next(input)?;
if c != '0' {
return Err(ParserError::from_input(input));
}
Ok(c)
}Using the higher level any parser opens parse_prefix to the helpers on the Parser trait,
like Parser::verify which fails a parse if a condition isn’t met, like our check above:
use winnow::Parser;
use winnow::token::any;
fn parse_prefix(input: &mut &str) -> Result<char> {
let c = any
.verify(|c| *c == '0')
.parse_next(input)?;
Ok(c)
}Matching a single token literal is common enough that Parser is implemented for
the char type, encapsulating both any and Parser::verify:
use winnow::Parser;
fn parse_prefix(input: &mut &str) -> Result<char> {
let c = '0'.parse_next(input)?;
Ok(c)
}§Tags
Stream also supports processing slices of tokens:
use winnow::stream::Stream;
use winnow::error::ParserError;
fn parse_prefix<'s>(input: &mut &'s str) -> Result<&'s str> {
let expected = "0x";
if input.len() < expected.len() {
return Err(ParserError::from_input(input));
}
let actual = input.next_slice(expected.len());
if actual != expected {
return Err(ParserError::from_input(input));
}
Ok(actual)
}
fn main() {
let mut input = "0x1a2b Hello";
let output = parse_prefix.parse_next(&mut input).unwrap();
assert_eq!(input, "1a2b Hello");
assert_eq!(output, "0x");
assert!(parse_prefix.parse_next(&mut "0o123").is_err());
}Matching the input position against a string literal is encapsulated in the literal parser:
use winnow::token::literal;
fn parse_prefix<'s>(input: &mut &'s str) -> Result<&'s str> {
let expected = "0x";
let actual = literal(expected).parse_next(input)?;
Ok(actual)
}Like for a single token, matching a string literal is common enough that Parser is implemented for the &str type:
use winnow::Parser;
fn parse_prefix<'s>(input: &mut &'s str) -> Result<&'s str> {
let actual = "0x".parse_next(input)?;
Ok(actual)
}See token for additional individual and token-slice parsers.
§Character Classes
Selecting a single char or a literal is fairly limited. Sometimes, you will want to select one of several
chars of a specific class, like digits. For this, we use the one_of parser:
use winnow::token::one_of;
fn parse_digits(input: &mut &str) -> Result<char> {
one_of(('0'..='9', 'a'..='f', 'A'..='F')).parse_next(input)
}
fn main() {
let mut input = "1a2b Hello";
let output = parse_digits.parse_next(&mut input).unwrap();
assert_eq!(input, "a2b Hello");
assert_eq!(output, '1');
assert!(parse_digits.parse_next(&mut "Z").is_err());
}Aside:
one_ofmight look straightforward, a function returning a value that implementsParser. Let’s look at it more closely as its used above (resolving all generic parameters):pub fn one_of<'i>( list: &'static [char] ) -> impl Parser<&'i str, char, ContextError> { // ... }If you have not programmed in a language where functions are values, the type signature of the
one_offunction might be a surprise. The functionone_ofreturns a function. The function it returns is aParser, taking a&strand returning anResult. This is a common pattern in winnow for configurable or stateful parsers.
Some of character classes are common enough that a named parser is provided, like with:
line_ending: Recognizes an end of line (both\nand\r\n)newline: Matches a newline character\ntab: Matches a tab character\t
You can then capture sequences of these characters with parsers like take_while.
use winnow::token::take_while;
fn parse_digits<'s>(input: &mut &'s str) -> Result<&'s str> {
take_while(1.., ('0'..='9', 'a'..='f', 'A'..='F')).parse_next(input)
}
fn main() {
let mut input = "1a2b Hello";
let output = parse_digits.parse_next(&mut input).unwrap();
assert_eq!(input, " Hello");
assert_eq!(output, "1a2b");
assert!(parse_digits.parse_next(&mut "Z").is_err());
}We could simplify this further by using one of the built-in character classes, hex_digit1:
use winnow::ascii::hex_digit1;
fn parse_digits<'s>(input: &mut &'s str) -> Result<&'s str> {
hex_digit1.parse_next(input)
}
fn main() {
let mut input = "1a2b Hello";
let output = parse_digits.parse_next(&mut input).unwrap();
assert_eq!(input, " Hello");
assert_eq!(output, "1a2b");
assert!(parse_digits.parse_next(&mut "Z").is_err());
}See ascii for more text-based parsers.