pub trait Tokenizer<'t, AT: Default> {
    fn nextsym(&mut self) -> Option<TerminalToken<'t, AT>>;

    fn linenum(&self) -> usize { ... }
    fn column(&self) -> usize { ... }
    fn position(&self) -> usize { ... }
    fn current_line(&self) -> &str { ... }
    fn get_line(&self, i: usize) -> Option<&str> { ... }
    fn get_slice(&self, start: usize, end: usize) -> &str { ... }
    fn source(&self) -> &str { ... }
    fn transform_wildcard(
        &self,
        t: TerminalToken<'t, AT>
    ) -> TerminalToken<'t, AT> { ... } fn next_tt(&mut self) -> TerminalToken<'t, AT> { ... } }
Expand description

This is the trait that repesents an abstract lexical scanner for any grammar. Any tokenizer must be adopted to implement this trait. The default implementations of functions such as Tokenizer::linenum do not return correct values and should be replaced: they’re only given defaults for easy compatibility with prototypes that may not have their own implementations. This trait replaced LexToken used in earlier versions of Rustlr.

Required Methods

retrieves the next TerminalToken, or None at end-of-stream.

Provided Methods

returns the current line number. The default implementation returns 0.

returns the current column (character position) on the current line. The default implementation returns 0;

returns the absolute character position of the tokenizer. The default implementation returns 0;

returns the current line being tokenized. The default implementation returns the empty string.

Retrieves the ith line of the raw input, if line index i is valid. This function should be called after the tokenizer has completed its task of scanning and tokenizing the entire input, when generating diagnostic messages when evaluating the AST post-parsing. The default implementation returns None.

Retrieves the source string slice at the indicated indices; returns the empty string if indices are invalid. The default implementation returns the empty string.

retrieves the source (such as filename or URL) of the tokenizer. The default implementation returns the empty string.

For internal use only unless not using StrTokenizer. This is a call-back function from the parser and can only be implemented when the grammar and token types are known. It transforms a token to a token representing the wildcard “_”, with semantic value indicating its position in the text. The default implementation returns the same TerminalToken. This function is automatically overridden by the generated lexer when using the -genlex option.

returns next TerminalToken. This provided function calls nextsym but will return a TerminalToken with sym=“EOF” at end of stream, with value=AT::default(). The is the only provided function that should not be re-implemented.

Implementors

The source code of this implementation of the Tokenizer trait also serves as an illustration of how the trait should be implemented.