klex 0.1.0

A simple lexer (tokenizer) generator for Rust
Documentation
  • Coverage
  • 75%
    21 out of 28 items documented3 out of 11 items with examples
  • Size
  • Source code size: 33.18 kB This is the summed size of all the files inside the crates.io package for this release.
  • Documentation size: 2.54 MB This is the summed size of all files generated by rustdoc for all configured targets
  • Ø build duration
  • this release: 28s Average build duration of successful builds.
  • all releases: 30s Average build duration of successful builds in releases after 2024-10-23.
  • Links
  • Homepage
  • kujirahand/klex
    0 0 0
  • crates.io
  • Dependencies
  • Versions
  • Owners
  • kujirahand

klex (kujira-lexer)

A simple lexer (tokenizer) generator for Rust.

English | 日本語はこちら

Overview

klex generates Rust lexer code from a single definition file. You describe token patterns with regular expressions, and it outputs Rust source that includes a Token struct and a Lexer struct.

Installation

From crates.io

cargo add klex

From source

cargo build --release

Usage

As a library

use klex::{generate_lexer, parse_spec};
use std::fs;

// Read input file
let input = fs::read_to_string("example.klex").expect("Failed to read input file");

// Parse the input
let spec = parse_spec(&input).expect("Failed to parse input");

// Generate Rust code
let output = generate_lexer(&spec, "example.klex");

// Write output
fs::write("output.rs", output).expect("Failed to write output");

Command line tool

cargo run -- <INPUT_FILE> [OUTPUT_FILE]

Input file format

An input file consists of three sections separated by %%:

(Rust code here – e.g. use statements)
%%
(Rules here – token patterns written as regular expressions)
%%
(Rust code here – e.g. main function or tests)

Writing rules

Write one rule per line in the following form:

<regex pattern> -> <TOKEN_NAME>

Examples:

[0-9]+ -> NUMBER
[a-zA-Z_][a-zA-Z0-9_]* -> IDENTIFIER
\+ -> PLUS
\- -> MINUS

Generated Token struct

The generated lexer produces tokens with the following shape:

struct Token {
    kind: u32,      // token kind (defined as constants)
    value: String,  // matched text
    row: usize,     // 1-based line number
    col: usize,     // 1-based column number
    length: usize,  // token length
    indent: usize,  // indentation width at line start (spaces)
    tag: isize,     // custom tag (defaults to 0)
}

Examples

See example.klex for a minimal definition file.

Generate a lexer

cargo run -- example.klex generated_lexer.rs

Use the generated lexer

The generated file exports a Lexer struct and related constants:

let input = "123 + abc".to_string();
let mut lexer = Lexer::new(input);

while let Some(token) = lexer.next_token() {
    println!("{:?}", token);
}

Tests

cargo test

License

MIT License