Crate uwl

source ·
Expand description

A stream of chars for building such as a lexer. Making the step of “iteration between characters” considerably easier. And providing certain utilites for making the code simpler. Respects both ASCII and Unicode.

Example, lexing identifiers, numbers and some punctuation marks:

use uwl::StringStream;
use uwl::StrExt;

#[derive(Debug, PartialEq)]
enum TokenKind {
    Ident,
    Number,
    Question,
    Exclamation,
    Comma,
    Point,

    // An invalid token
    Illegal,
}

#[derive(Debug, PartialEq)]
struct Token<'a> {
    kind: TokenKind,
    lit: &'a str,
}

impl<'a> Token<'a> {
    fn new(kind: TokenKind, lit: &'a str) -> Self {
        Token {
            kind,
            lit,
        }
    }
}

fn lex<'a>(stream: &mut StringStream<'a>) -> Option<Token<'a>> {
    if stream.at_end() {
        return None;
    }

    Some(match stream.current()? {
        // Ignore whitespace.
        s if s.is_whitespace() => {
            stream.next()?;
            return lex(stream);
        },
        s if s.is_alphabetic() => Token::new(TokenKind::Ident, stream.take_while(|s| s.is_alphabetic())),
        s if s.is_numeric() => Token::new(TokenKind::Number, stream.take_while(|s| s.is_numeric())),
        "?" => Token::new(TokenKind::Question, stream.next()?),
        "!" => Token::new(TokenKind::Exclamation, stream.next()?),
        "," => Token::new(TokenKind::Comma, stream.next()?),
        "." => Token::new(TokenKind::Point, stream.next()?),
        _ => Token::new(TokenKind::Illegal, stream.next()?),
    })
}

fn main() {
    let mut stream = StringStream::new("Hello, world! ...world? Hello?");

    assert_eq!(lex(&mut stream).unwrap(), Token::new(TokenKind::Ident, "Hello"));
    assert_eq!(lex(&mut stream).unwrap(), Token::new(TokenKind::Comma, ","));
    assert_eq!(lex(&mut stream).unwrap(), Token::new(TokenKind::Ident, "world"));
    assert_eq!(lex(&mut stream).unwrap(), Token::new(TokenKind::Exclamation, "!"));
    assert_eq!(lex(&mut stream).unwrap(), Token::new(TokenKind::Point, "."));
    assert_eq!(lex(&mut stream).unwrap(), Token::new(TokenKind::Point, "."));
    assert_eq!(lex(&mut stream).unwrap(), Token::new(TokenKind::Point, "."));
    assert_eq!(lex(&mut stream).unwrap(), Token::new(TokenKind::Ident, "world"));
    assert_eq!(lex(&mut stream).unwrap(), Token::new(TokenKind::Question, "?"));
    assert_eq!(lex(&mut stream).unwrap(), Token::new(TokenKind::Ident, "Hello"));
    assert_eq!(lex(&mut stream).unwrap(), Token::new(TokenKind::Question, "?"));

    // Reached the end
    assert_eq!(lex(&mut stream), None);
}

Structs

A stream of chars. Handles both ASCII and Unicode.

Traits

Brings over some is_* methods from char to &str. Look at char’s docs for more reference.