Module winnow::_tutorial::chapter_2

source ·
Available on crate feature unstable-doc only.
Expand description

Chapter 2: Tokens and Tags

The simplest useful parser you can write is one which matches tokens.

Tokens

Stream provides some core operations to help with parsing. For example, to process a single token, you can do:

use winnow::stream::Stream;
use winnow::error::ParserError;
use winnow::error::ErrorKind;
use winnow::error::ErrMode;

fn parse_prefix(input: &mut &str) -> PResult<char> {
    let c = input.next_token().ok_or_else(|| {
        ErrMode::from_error_kind(input, ErrorKind::Token)
    })?;
    if c != '0' {
        return Err(ErrMode::from_error_kind(input, ErrorKind::Verify));
    }
    Ok(c)
}

fn main()  {
    let mut input = "0x1a2b Hello";

    let output = parse_prefix.parse_next(&mut input).unwrap();

    assert_eq!(input, "x1a2b Hello");
    assert_eq!(output, '0');

    assert!(parse_prefix.parse_next(&mut "d").is_err());
}

any and Parser::verify are Parser building blocks on top of Stream:

use winnow::Parser;
use winnow::token::any;

fn parse_prefix(input: &mut &str) -> PResult<char> {
    any.verify(|c| *c == '0').parse_next(input)
}

Matching a single token literal is common enough that Parser is implemented for char.

use winnow::Parser;

fn parse_prefix(input: &mut &str) -> PResult<char> {
    '0'.parse_next(input)
}

Tags

Stream also supports processing slices of tokens:

use winnow::stream::Stream;
use winnow::error::ParserError;
use winnow::error::ErrorKind;
use winnow::error::ErrMode;

fn parse_prefix<'s>(input: &mut &'s str) -> PResult<&'s str> {
    let expected = "0x";
    if input.len() < expected.len() {
        return Err(ErrMode::from_error_kind(input, ErrorKind::Slice));
    }
    let actual = input.next_slice(expected.len());
    if actual != expected {
        return Err(ErrMode::from_error_kind(input, ErrorKind::Verify));
    }
    Ok(actual)
}

fn main()  {
    let mut input = "0x1a2b Hello";

    let output = parse_prefix.parse_next(&mut input).unwrap();
    assert_eq!(input, "1a2b Hello");
    assert_eq!(output, "0x");

    assert!(parse_prefix.parse_next(&mut "0o123").is_err());
}

Again, matching a literal is common enough that Parser is implemented for &str:

use winnow::Parser;

fn parse_prefix<'s>(input: &mut &'s str) -> PResult<&'s str> {
    "0x".parse_next(input)
}

In winnow, we call this type of parser a tag. See token for additional individual and token-slice parsers.

Character Classes

Selecting a single char or a tag is fairly limited. Sometimes, you will want to select one of several chars of a specific class, like digits. For this, we use the one_of parser:

use winnow::token::one_of;

fn parse_digits(input: &mut &str) -> PResult<char> {
    one_of(('0'..='9', 'a'..='f', 'A'..='F')).parse_next(input)
}

fn main() {
    let mut input = "1a2b Hello";

    let output = parse_digits.parse_next(&mut input).unwrap();
    assert_eq!(input, "a2b Hello");
    assert_eq!(output, '1');

    assert!(parse_digits.parse_next(&mut "Z").is_err());
}

Aside: one_of might look straightforward, a function returning a value that implements Parser. Let’s look at it more closely as its used above (resolving all generic parameters):

pub fn one_of<'i>(
    list: &'static [char]
) -> impl Parser<&'i str, char, InputError<&'i str>> {
    // ...
}

If you have not programmed in a language where functions are values, the type signature of the one_of function might be a surprise. The function one_of returns a function. The function it returns is a Parser, taking a &str and returning an PResult. This is a common pattern in winnow for configurable or stateful parsers.

Some of character classes are common enough that a named parser is provided, like with:

  • line_ending: Recognizes an end of line (both \n and \r\n)
  • newline: Matches a newline character \n
  • tab: Matches a tab character \t

You can then capture sequences of these characters with parsers like take_while.

use winnow::token::take_while;

fn parse_digits<'s>(input: &mut &'s str) -> PResult<&'s str> {
    take_while(1.., ('0'..='9', 'a'..='f', 'A'..='F')).parse_next(input)
}

fn main() {
    let mut input = "1a2b Hello";

    let output = parse_digits.parse_next(&mut input).unwrap();
    assert_eq!(input, " Hello");
    assert_eq!(output, "1a2b");

    assert!(parse_digits.parse_next(&mut "Z").is_err());
}

We could simplify this further by using one of the built-in character classes, hex_digit1:

use winnow::ascii::hex_digit1;

fn parse_digits<'s>(input: &mut &'s str) -> PResult<&'s str> {
    hex_digit1.parse_next(input)
}

fn main() {
    let mut input = "1a2b Hello";

    let output = parse_digits.parse_next(&mut input).unwrap();
    assert_eq!(input, " Hello");
    assert_eq!(output, "1a2b");

    assert!(parse_digits.parse_next(&mut "Z").is_err());
}

See ascii for more text-based parsers.

Re-exports