Expand description

Rustlr allows the use of any lexical analyzer (tokenizer) that satisfies the Tokenizer trait. However, a basic tokenizer, StrTokenizer is provided that suffices for many examples. This tokenizer is not maximally efficient (not single-pass) as it uses regex.

The main contents of this module are TerminalToken, Tokenizer, RawToken, StrTokenizer and LexSource. For backwards compatibility with Rustlr version 0.1, Lexer, Lextoken and charlexer are retained, for now.

Structs

Structure to hold contents of a source (such as contents of file). A StrTokenizer can be created from such a struct. It reads the contents of a file using std::fs::read_to_string and stores it locally.

This structure is deprecated by TerminalToken. The structure is expected to be returned by the lexical analyzer (Lexer objects). Furthermore, the .sym field of a Lextoken must match the name of a terminal symbol specified in the grammar.

General-purpose, zero-copy lexical analyzer that produces RawTokens from an str. This tokenizer uses regex, although not for everything. For example, to allow for string literals that contain escaped quotations, a direct loop is implemented. The tokenizer gives the option of returning newlines, whitespaces (with count) and comments as special tokens. It recognizes mult-line string literals, multi-line as well as single-line comments, and returns the starting line and column positions of each token.

This the token type required by Rustlr while parsing. A TerminalToken must correspond to a terminal symbol of the grammar being parsed. The sym field of the struct must correspond to the name of the terminal as defined by the grammar and the value must be of type AT, which the is abstract syntax type (absyntype) of the grammar. It also includes the starting line and column positions of the token. These tokens are generated by implementing Tokenizer::nextsym.

This struct is deprecated by charscanner. It is compatible with Lexer and Lextoken, which are also deprecated.

This is a sample Lexer implementation designed to return every character in a string as a separate token, and is used in small grammars for testing and illustration purposes. It is assumed that the characters read are defined as terminal symbols in the grammar. This replaces charlexer using Tokenizer and RawToken.

Enums

structure produced by StrTokenizer. TerminalTokens must be created from RawTokens (in the Tokenizer::nextsym function) once the grammar’s terminal symbols and abstract syntax type are known.

Traits

This trait is deprecated by Tokenizer and is only retained for compatibility.

This is the trait that repesents an abstract lexical scanner for any grammar. Any tokenizer must be adopted to implement this trait. The default implementations of functions such as Tokenizer::linenum do not return correct values and should be replaced: they’re only given defaults for easy compatibility with prototypes that may not have their own implementations. This trait replaced LexToken used in earlier versions of Rustlr.