pattern-lexer-0.1.0 has been yanked.
Lexer
My personal implementation of a lexer.
Principles
The lexer is plugin based. This is not a parser nor a compiler.
Tokens
There are 8 premade kinds of token (examples are not mandatory):
TokenKind |
Explanation | Examples |
|---|---|---|
KEYWORD |
Reserved words | if return ... |
DELIMITER |
Paired delimiter symbols | () [] {} ... |
PUNCTUATION |
Punctuation symbols | ; . ... |
OPERATOR |
Symbols that operates on arguments | + - = ... |
COMMENT |
Line or block comments | // /* ... */ ... |
WHITESPACE |
Non-printable characters | - |
LITERAL |
Numerical, logical, textual values | 1 true "true" ... |
IDENTIFIER |
Names assigned in a program | x temp PRINT ... |
These token kinds (except IDENTIFIER) should be constructed with a name that
can be used to differentiate tokens with same kind.
Each TokenKind can be associated with one or more Pattern
that match them with a string through a Tokenizer, giving a Token.
Lexer
The Lexer should be constructed with a LexerBuilder that wraps several Tokenizer.
Examples
Simple maths Lexer
let plus = new;
let minus = new;
let star = new;
let slash = new;
let equal = new;
let number = new;
let id_regex = new.unwrap;
let id = new;
let whitespace = new;
let lexer = builder
.extend
.build;
lexer.tokenize?;
/* [Token { kind: IDENTIFIER, value: "x_4" },
Token { kind: WHITESPACE("SPACE"), value: " " },
Token { kind: OPERATOR("EQUAL"), value: "=" },
Token { kind: WHITESPACE("SPACE"), value: " " },
Token { kind: LITERAL("NUMBER"), value: "2" },
Token { kind: WHITESPACE("SPACE"), value: " " },
Token { kind: OPERATOR("PLUS"), value: "+" },
Token { kind: WHITESPACE("SPACE"), value: " " },
Token { kind: LITERAL("NUMBER"), value: "2" },
Token { kind: WHITESPACE("SPACE"), value: " " },
Token { kind: OPERATOR("EQUAL"), value: "=" },
Token { kind: WHITESPACE("SPACE"), value: " " },
Token { kind: LITERAL("NUMBER"), value: "4" },
Token { kind: WHITESPACE("SPACE"), value: " " },
Token { kind: OPERATOR("STAR"), value: "*" },
Token { kind: WHITESPACE("SPACE"), value: " " },
Token { kind: LITERAL("NUMBER"), value: "0.5" }] */