Type Definition fundu::Delimiter

source ·
pub type Delimiter = fn(_: u8) -> bool;
Expand description

An ascii delimiter defined as closure.

The Delimiter is currently a type alias for a closure taking a u8 byte and returning a bool. Most likely, the Delimiter is used to define some whitespace but whitespace definitions differ, so a closure provides the most flexible definition of a delimiter. For example the definition of whitespace from rust u8::is_ascii_whitespace:

Checks if the value is an ASCII whitespace character: U+0020 SPACE, U+0009 HORIZONTAL TAB,
U+000A LINE FEED, U+000C FORM FEED, or U+000D CARRIAGE RETURN.

Rust uses the WhatWG Infra Standard’s definition of ASCII whitespace. There are several other
definitions in wide use. For instance, the POSIX locale includes U+000B VERTICAL TAB as well
as all the above characters, but—from the very same specification—the default rule for “field
splitting” in the Bourne shell considers only SPACE, HORIZONTAL TAB, and LINE FEED as
whitespace.

Problems

The delimiter takes a u8 as input, but matching any non-ascii (0x80 - 0xff) bytes may lead to serious problems if the input string contains multi-byte utf-8 characters. It’s always a good idea to consider this, especially, if the input for the parser comes from an untrusted source. So, as a general rule of thumb, don’t match any byte within the 0x80 - 0xff range.

Examples

use fundu::Delimiter;

fn is_delimiter(delimiter: Delimiter, byte: u8) -> bool {
    delimiter(byte)
}

assert!(is_delimiter(
    |byte| matches!(byte, b' ' | b'\n' | b'\t'),
    b' '
));
assert!(!is_delimiter(
    |byte| matches!(byte, b' ' | b'\n' | b'\t'),
    b'\r'
));
assert!(is_delimiter(|byte| byte.is_ascii_whitespace(), b'\r'));