klex (kujira-lexer)
A simple lexer (tokenizer) generator for Rust.
English | 日本語はこちら
Overview
klex generates Rust lexer code from a single definition file. You describe token patterns with regular expressions, and it outputs Rust source that includes a Token struct and a Lexer struct.
Installation
From crates.io
Or add to your Cargo.toml:
[]
= "0.1.2"
From source
Usage
As a library
use ;
use fs;
// Read input file
let input = read_to_string.expect;
// Parse the input
let spec = parse_spec.expect;
// Generate Rust code
let output = generate_lexer;
// Write output
write.expect;
Command line tool
Input file format
An input file consists of three sections separated by %%:
(Rust code here – e.g. use statements)
%%
(Rules here – token patterns written as regular expressions)
%%
(Rust code here – e.g. main function or tests)
Writing rules
Write one rule per line in the following form:
<pattern> -> <TOKEN_NAME>
Supported pattern formats:
'c'- Single character literal"string"- String literal[0-9]+- Character range with quantifier[abc]+- Character set with quantifier/regex/- Regular expression pattern( pattern1 | pattern2 )- Choice between patterns\+- Escaped special characters (\+,\*,\n,\t, etc.)?- Any single character?+- One or more any characters
Examples:
[0-9]+ -> NUMBER
[a-zA-Z_][a-zA-Z0-9_]* -> IDENTIFIER
\+ -> PLUS
\- -> MINUS
\n -> NEWLINE
\t -> TAB
? -> ANY_CHAR
?+ -> ANY_CHAR_PLUS
"hello" -> HELLO
/[0-9]+\.[0-9]+/ -> FLOAT
Generated Token struct
The generated lexer produces tokens with the following shape:
Advanced Features
Escaped Characters
klex supports escaped special characters:
\+ -> PLUS_ESCAPED # Matches literal '+'
\* -> MULTIPLY # Matches literal '*'
\n -> NEWLINE # Matches newline character
\t -> TAB # Matches tab character
Wildcard Patterns
Use wildcard patterns for flexible matching:
? -> ANY_CHAR # Matches any single character
?+ -> ANY_CHAR_PLUS # Matches one or more characters (i.e., captures to the end)
Context-Dependent Rules
Rules can depend on the previous token:
%IDENTIFIER [0-9]+ -> INDEXED_NUMBER # Only after IDENTIFIER
Action Code
Execute custom Rust code when a pattern matches:
"debug" -> { println!("Debug mode!"); None }
Examples
See tests/*.klex files for definition examples.
Generate a lexer
Use the generated lexer
The generated file exports a Lexer struct and related constants:
let input = "123 + abc".to_string;
let mut lexer = new;
while let Some = lexer.next_token
Tests
Run all tests:
License
MIT License