Expand description
Alkale is a library focused on streamlining the production of hand-written lexers.
A lexer, generally speaking, is a function that converts source code into
a FinalizedLexerResult.
A typical lexer function will look something like this.
use alkale::{SourceCodeScanner, LexerResult, FinalizedLexerResult};
enum MyTokenData {
// ...
}
fn lexer(source: &str) -> FinalizedLexerResult<MyTokenData> {
// This will serve as an interface into the code for processing.
let scanner = SourceCodeScanner::new(source);
// This serves as a collection of our produced tokens and notifications.
let mut result = LexerResult::new();
while scanner.has_next() {
// Main body goes here, processing the scanner into
// tokens and notifications to be passed into the result.
}
// Finalize the result and return it.
result.finalize()
}Many methods exist on SourceCodeScanner to consume the source code in various ways,
see its documentation for more details. Regardless, valid data from the source code should be
converted into Tokens, and invalid data into Notifications, both to be reported to the
LexerResult. These four datatypes, as well as Span (used to create
Tokens) pretty much make up the backbone of every single lexer.
§Features
Alkale has a single feature, common. This feature is enabled by default and
introduces a huge amount of helper methods to SourceCodeScanner for things
such as number parsing, strings, identifiers, etc.
§Example
Here is an example of a simple lexer that tokenizes words in the program, ignoring whitespace and throwing an error for everything else.
use alkale::{
format_notification, notification::NotificationSeverity, token::Token, FinalizedLexerResult,
LexerResult, SourceCodeScanner,
};
type Word<'a> = &'a str;
fn lexer(source: &str) -> FinalizedLexerResult<Word<'_>> {
let scanner = SourceCodeScanner::new(source);
let mut result = LexerResult::new();
while scanner.has_next() {
// Try to parse out a word.
if let Some(identifier) = scanner.try_consume_standard_identifier() {
// We found a word, push it and restart the loop.
result.push_token(Token::from_spanned(identifier));
continue;
}
// No word was found, consume one character.
if let Some(char) = scanner.next_span() {
// If this character wasn't whitespace (i.e. illegal char) then
// report a notification.
if !char.is_whitespace() {
format_notification!("Unrecognized character '{}'", char.data)
.span(char.span)
.severity(NotificationSeverity::Error)
.report(&mut result);
}
}
}
result.finalize()
}This example should give a basic overview of what the main loop of a lexer should look like. Check for a pattern, if it was found, parse it into a token and reset the loop. If the pattern wasn’t found, continue onto the next pattern until you reach a base case.
Modules§
- common
common - This module implements many helpful “built-in” methods to
SourceCodeScanner. - notification
- Module that contains standard
Notificationtype and related data. - span
- Module containing types for span information. Notably
Span. - token
- Module for
Tokens and related types.
Macros§
- format_
notification - Works the same as the
formatmacro, but passes its result intoNotificationBuilder::new. This may produce more succinct code than passing the format macro into the constructor. - map_
double_ char_ tokens common - Used to map one-to-two-char patterns into tokens and automatically append them to a
LexerResult. - map_
single_ char_ token common - Used to map very simple, single-char patterns into tokens and automatically append them to a
LexerResult.
Structs§
- Finalized
Lexer Result - The final result of a lexer. This is returned by
a
LexerResult’sfinalizemethod. - Lexer
Result - Used to accumulate
Notifications andTokens during lexing. - Source
Code Scanner - An iterator-like interface into source code. This type is not thread-safe.