klex 0.1.1

A simple lexer (tokenizer) generator for Rust
Documentation
  • Coverage
  • 78.57%
    33 out of 42 items documented3 out of 14 items with examples
  • Size
  • Source code size: 113.94 kB This is the summed size of all the files inside the crates.io package for this release.
  • Documentation size: 3.3 MB This is the summed size of all files generated by rustdoc for all configured targets
  • Ø build duration
  • this release: 27s Average build duration of successful builds.
  • all releases: 30s Average build duration of successful builds in releases after 2024-10-23.
  • Links
  • Homepage
  • kujirahand/klex
    0 0 0
  • crates.io
  • Dependencies
  • Versions
  • Owners
  • kujirahand

klex (kujira-lexer)

A simple lexer (tokenizer) generator for Rust.

English | 日本語はこちら

Overview

klex generates Rust lexer code from a single definition file. You describe token patterns with regular expressions, and it outputs Rust source that includes a Token struct and a Lexer struct.

Installation

From crates.io

cargo add klex

From source

cargo build --release

Usage

As a library

use klex::{generate_lexer, parse_spec};
use std::fs;

// Read input file
let input = fs::read_to_string("example.klex").expect("Failed to read input file");

// Parse the input
let spec = parse_spec(&input).expect("Failed to parse input");

// Generate Rust code
let output = generate_lexer(&spec, "example.klex");

// Write output
fs::write("output.rs", output).expect("Failed to write output");

Command line tool

cargo run -- <INPUT_FILE> [OUTPUT_FILE]

Input file format

An input file consists of three sections separated by %%:

(Rust code here – e.g. use statements)
%%
(Rules here – token patterns written as regular expressions)
%%
(Rust code here – e.g. main function or tests)

Writing rules

Write one rule per line in the following form:

<pattern> -> <TOKEN_NAME>

Supported pattern formats:

  • 'c' - Single character literal
  • "string" - String literal
  • [0-9]+ - Character range with quantifier
  • [abc]+ - Character set with quantifier
  • /regex/ - Regular expression pattern
  • ( pattern1 | pattern2 ) - Choice between patterns
  • \+ - Escaped special characters (\+, \*, \n, \t, etc.)
  • ? - Any single character
  • ?+ - One or more any characters

Examples:

[0-9]+ -> NUMBER
[a-zA-Z_][a-zA-Z0-9_]* -> IDENTIFIER
\+ -> PLUS
\- -> MINUS
\n -> NEWLINE
\t -> TAB
? -> ANY_CHAR
?+ -> ANY_CHAR_PLUS
"hello" -> HELLO
/[0-9]+\.[0-9]+/ -> FLOAT

Generated Token struct

The generated lexer produces tokens with the following shape:

struct Token {
    kind: u32,      // token kind (defined as constants)
    value: String,  // matched text
    row: usize,     // 1-based line number
    col: usize,     // 1-based column number
    length: usize,  // token length
    indent: usize,  // indentation width at line start (spaces)
    tag: isize,     // custom tag (defaults to 0)
}

Advanced Features

Escaped Characters

klex supports escaped special characters:

\+ -> PLUS_ESCAPED    # Matches literal '+'
\* -> MULTIPLY        # Matches literal '*'
\n -> NEWLINE         # Matches newline character
\t -> TAB             # Matches tab character

Wildcard Patterns

Use wildcard patterns for flexible matching:

? -> ANY_CHAR         # Matches any single character
?+ -> ANY_CHAR_PLUS   # Matches one or more characters

Context-Dependent Rules

Rules can depend on the previous token:

%IDENTIFIER [0-9]+ -> INDEXED_NUMBER   # Only after IDENTIFIER

Action Code

Execute custom Rust code when a pattern matches:

"debug" -> { println!("Debug mode!"); None }

Examples

See example.klex for a minimal definition file.

Generate a lexer

cargo run -- example.klex generated_lexer.rs

Use the generated lexer

The generated file exports a Lexer struct and related constants:

let input = "123 + abc".to_string();
let mut lexer = Lexer::new(input);

while let Some(token) = lexer.next_token() {
    println!("{:?}", token);
}

Tests

Run all tests:

cargo test

Test files include:

  • tests/example.klex - Basic lexer example
  • tests/test_context.klex - Context-dependent rules
  • tests/test_new_patterns.klex - Various pattern types
  • tests/test_escaped_chars.klex - Escaped character patterns
  • tests/test_any_chars.klex - Wildcard patterns

License

MIT License