Crate sas_lexer

Source
Expand description

§SAS Lexer

A lexer for the SAS programming language.

§Usage

use sas_lexer::{lex_program, LexResult, TokenIdx};

let source = "data mydata; set mydataset; run;";

let LexResult { buffer, .. } = lex_program(&source).unwrap();

let tokens: Vec<TokenIdx> = buffer.iter_tokens().collect();

for token in tokens {
    println!("{:?}", buffer.get_token_raw_text(token, &source));
}

§Features

  • macro_sep: Enables a special virtual MacroSep token that is emitted between open code and macro statements when there is no “natural” separator, or when semicolon is missing between two macro statements (a coding error). This may be used by a downstream parser as a reliable terminating token for dynamic open code and thus avoid doing lookaheads. Dynamic, means that the statement has a macro statements in it, like data %if cond %then %do; t1 %end; %else %do; t2 %end;;
  • serde: Enables serialization and deserialization of the ResolvedTokenInfo struct using the serde library. For an example of usage, see the Python bindings crate sas-lexer-py.
  • opti_stats: Enables some additional statistics during lexing, used for performance tuning. Not intended for general use.

§License

Licensed under the Affero GPL v3 license.

Modules§

error

Structs§

LexResult
Result of lexing
ResolvedTokenInfo
A struct with all token information usable without the TokenizedBuffer
TokenIdx
A token index, used to get actual token data via the tokenized buffer.
TokenInfo
A struct to hold information about the tokens in the tokenized buffer.
TokenizedBuffer
A special structure produced by the lexer that stores the full information about lexed tokens and lines. A struct of arrays, used to optimize memory usage and cache locality.

Enums§

Payload
Enum representing varios types of extra data associated with a token.
TokenChannel
Token channel.
TokenType
What you expect - the token types.

Functions§

lex_program
Lex the source code of an entire program.

Type Aliases§

TokenInfoIter