Crate alkale

Crate alkale 

Source
Expand description

Alkale is a library focused on streamlining the production of hand-written lexers.

A lexer, generally speaking, is a function that converts source code into a FinalizedLexerResult.

A typical lexer function will look something like this.

use alkale::{SourceCodeScanner, LexerResult, FinalizedLexerResult};

enum MyTokenData {
    // ...
}

fn lexer(source: &str) -> FinalizedLexerResult<MyTokenData> {
    // This will serve as an interface into the code for processing.
    let scanner = SourceCodeScanner::new(source);

    // This serves as a collection of our produced tokens and notifications.
    let mut result = LexerResult::new();

    while scanner.has_next() {
        // Main body goes here, processing the scanner into
        // tokens and notifications to be passed into the result.
    }

    // Finalize the result and return it.
    result.finalize()
}

Many methods exist on SourceCodeScanner to consume the source code in various ways, see its documentation for more details. Regardless, valid data from the source code should be converted into Tokens, and invalid data into Notifications, both to be reported to the LexerResult. These four datatypes, as well as Span (used to create Tokens) pretty much make up the backbone of every single lexer.

§Features

Alkale has a single feature, common. This feature is enabled by default and introduces a huge amount of helper methods to SourceCodeScanner for things such as number parsing, strings, identifiers, etc.

§Example

Here is an example of a simple lexer that tokenizes words in the program, ignoring whitespace and throwing an error for everything else.

use alkale::{
    format_notification, notification::NotificationSeverity, token::Token, FinalizedLexerResult,
    LexerResult, SourceCodeScanner,
};

type Word<'a> = &'a str;

fn lexer(source: &str) -> FinalizedLexerResult<Word<'_>> {
    let scanner = SourceCodeScanner::new(source);
    let mut result = LexerResult::new();

    while scanner.has_next() {
        // Try to parse out a word.
        if let Some(identifier) = scanner.try_consume_standard_identifier() {
            // We found a word, push it and restart the loop.
            result.push_token(Token::from_spanned(identifier));
            continue;
        }

        // No word was found, consume one character.
        if let Some(char) = scanner.next_span() {
            // If this character wasn't whitespace (i.e. illegal char) then
            // report a notification.
            if !char.is_whitespace() {
                format_notification!("Unrecognized character '{}'", char.data)
                    .span(char.span)
                    .severity(NotificationSeverity::Error)
                    .report(&mut result);
            }
        }
    }

    result.finalize()
}

This example should give a basic overview of what the main loop of a lexer should look like. Check for a pattern, if it was found, parse it into a token and reset the loop. If the pattern wasn’t found, continue onto the next pattern until you reach a base case.

Modules§

commoncommon
This module implements many helpful “built-in” methods to SourceCodeScanner.
notification
Module that contains standard Notification type and related data.
span
Module containing types for span information. Notably Span.
token
Module for Tokens and related types.

Macros§

format_notification
Works the same as the format macro, but passes its result into NotificationBuilder::new. This may produce more succinct code than passing the format macro into the constructor.
map_double_char_tokenscommon
Used to map one-to-two-char patterns into tokens and automatically append them to a LexerResult.
map_single_char_tokencommon
Used to map very simple, single-char patterns into tokens and automatically append them to a LexerResult.

Structs§

FinalizedLexerResult
The final result of a lexer. This is returned by a LexerResult’s finalize method.
LexerResult
Used to accumulate Notifications and Tokens during lexing.
SourceCodeScanner
An iterator-like interface into source code. This type is not thread-safe.