malk-lexer 0.1.1

A simple unicode lexer
Documentation

malk-lexer

A unicode lexer for use as a first-pass when writing a parser.

The main function exported by this library is lex which takes a &str and a table of valid symbols and converts them to a token tree.

The kinds of token recognized by the lexer are:

  • Idents: A string starting with a XID_Start character followed by a sequence of XID_Continue characters.
  • Whitespace: Any sequence of whitespace characters.
  • Brackets: Any bracket character, it's corresponding closing bracket and the tokens in-between returned as a sub-tree.
  • Symbols: Any string that appears in the symbol table provided to lex
  • Strings: A string enclosed with either " or ' and which may contain escaped characters.

Patches welcome!