Expand description
This crate can be used to parse Python source code into an Abstract Syntax Tree.
§Overview:
The process by which source code is parsed into an AST can be broken down into two general stages: lexical analysis and parsing.
During lexical analysis, the source code is converted into a stream of lexical
tokens that represent the smallest meaningful units of the language. For example,
the source code print("Hello world")
would roughly be converted into the following
stream of tokens:
Name("print"), LeftParen, String("Hello world"), RightParen
these tokens are then consumed by the parser, which matches them against a set of grammar rules to verify that the source code is syntactically valid and to construct an AST that represents the source code.
During parsing, the parser consumes the tokens generated by the lexer and constructs a tree representation of the source code. The tree is made up of nodes that represent the different syntactic constructs of the language. If the source code is syntactically invalid, parsing fails and an error is returned. After a successful parse, the AST can be used to perform further analysis on the source code. Continuing with the example above, the AST generated by the parser would roughly look something like this:
node: Expr {
value: {
node: Call {
func: {
node: Name {
id: "print",
ctx: Load,
},
},
args: [
node: Constant {
value: Str("Hello World"),
kind: None,
},
],
keywords: [],
},
},
},
Note: The Tokens/ASTs shown above are not the exact tokens/ASTs generated by the parser.
§Source code layout:
The functionality of this crate is split into several modules:
- token: This module contains the definition of the tokens that are generated by the lexer.
- lexer: This module contains the lexer and is responsible for generating the tokens.
- parser: This module contains an interface to the parser and is responsible for generating the AST.
- Functions and strings have special parsing requirements that are handled in additional files.
- mode: This module contains the definition of the different modes that the parser can be in.
§Examples
For example, to get a stream of tokens from a given string, one could do this:
use rustpython_parser::{lexer::lex, Mode};
let python_source = r#"
def is_odd(i):
return bool(i & 1)
"#;
let mut tokens = lex(python_source, Mode::Module);
assert!(tokens.all(|t| t.is_ok()));
These tokens can be directly fed into the parser to generate an AST:
use rustpython_parser::{lexer::lex, Mode, parse_tokens};
let python_source = r#"
def is_odd(i):
return bool(i & 1)
"#;
let tokens = lex(python_source, Mode::Module);
let ast = parse_tokens(tokens, Mode::Module, "<embedded>");
assert!(ast.is_ok());
Alternatively, you can use one of the other parse_*
functions to parse a string directly without using a specific
mode or tokenizing the source beforehand:
use rustpython_parser::{Parse, ast};
let python_source = r#"
def is_odd(i):
return bool(i & 1)
"#;
let ast = ast::Suite::parse(python_source, "<embedded>");
assert!(ast.is_ok());
Re-exports§
pub use rustpython_ast as ast;
Modules§
- lexer
- This module takes care of lexing Python source text.
- source_
code - text_
size - Newtypes for working with text sizes/ranges in a more type-safe manner.
Enums§
- FString
Error Type - Represents the different types of errors that can occur during parsing of an f-string.
- Mode
- The mode argument specifies in what way code must be parsed.
- Parse
Error Type - Represents the different types of errors that can occur during parsing.
- String
Kind - The kind of string literal as described in the String and Bytes literals section of the Python reference.
- Tok
- The set of tokens the Python source code can be tokenized in.
Traits§
- Parse
- Parse Python code string to implementor’s type.
Functions§
- parse
- Parse the given Python source code using the specified
Mode
. - parse_
expression Deprecated - Parses a single Python expression.
- parse_
expression_ starts_ at Deprecated - Parses a Python expression from a given location.
- parse_
program Deprecated - Parse a full Python program usually consisting of multiple lines.
- parse_
starts_ at - Parse the given Python source code using the specified
Mode
and [Location
]. - parse_
tokens - Parse an iterator of
LexResult
s using the specifiedMode
.
Type Aliases§
- Parse
Error - Represents represent errors that occur during parsing and are
returned by the
parse_*
functions.