Skip to main content

rustpython_ruff_python_parser/
lib.rs

1//! This crate can be used to parse Python source code into an Abstract
2//! Syntax Tree.
3//!
4//! ## Overview
5//!
6//! The process by which source code is parsed into an AST can be broken down
7//! into two general stages: [lexical analysis] and [parsing].
8//!
9//! During lexical analysis, the source code is converted into a stream of lexical
10//! tokens that represent the smallest meaningful units of the language. For example,
11//! the source code `print("Hello world")` would _roughly_ be converted into the following
12//! stream of tokens:
13//!
14//! ```text
15//! Name("print"), LeftParen, String("Hello world"), RightParen
16//! ```
17//!
18//! These tokens are then consumed by the `ruff_python_parser`, which matches them against a set of
19//! grammar rules to verify that the source code is syntactically valid and to construct
20//! an AST that represents the source code.
21//!
22//! During parsing, the `ruff_python_parser` consumes the tokens generated by the lexer and constructs
23//! a tree representation of the source code. The tree is made up of nodes that represent
24//! the different syntactic constructs of the language. If the source code is syntactically
25//! invalid, parsing fails and an error is returned. After a successful parse, the AST can
26//! be used to perform further analysis on the source code. Continuing with the example
27//! above, the AST generated by the `ruff_python_parser` would _roughly_ look something like this:
28//!
29//! ```text
30//! node: Expr {
31//!     value: {
32//!         node: Call {
33//!             func: {
34//!                 node: Name {
35//!                     id: "print",
36//!                     ctx: Load,
37//!                 },
38//!             },
39//!             args: [
40//!                 node: Constant {
41//!                     value: Str("Hello World"),
42//!                     kind: None,
43//!                 },
44//!             ],
45//!             keywords: [],
46//!         },
47//!     },
48//! },
49//!```
50//!
51//! **Note:** The Tokens/ASTs shown above are not the exact tokens/ASTs generated by the `ruff_python_parser`.
52//! Refer to the [playground](https://play.ruff.rs) for the correct representation.
53//!
54//! ## Source code layout
55//!
56//! The functionality of this crate is split into several modules:
57//!
58//! - token: This module contains the definition of the tokens that are generated by the lexer.
59//! - [lexer]: This module contains the lexer and is responsible for generating the tokens.
60//! - parser: This module contains an interface to the [Parsed] and is responsible for generating the AST.
61//! - mode: This module contains the definition of the different modes that the `ruff_python_parser` can be in.
62//!
63//! [lexical analysis]: https://en.wikipedia.org/wiki/Lexical_analysis
64//! [parsing]: https://en.wikipedia.org/wiki/Parsing
65//! [lexer]: crate::lexer
66
67pub use crate::error::{
68    InterpolatedStringErrorType, LexicalErrorType, ParseError, ParseErrorType,
69    UnsupportedSyntaxError, UnsupportedSyntaxErrorKind,
70};
71pub use crate::parser::ParseOptions;
72
73use crate::parser::Parser;
74
75use ruff_python_ast::token::Tokens;
76use ruff_python_ast::{
77    Expr, Mod, ModExpression, ModModule, PySourceType, StringFlags, StringLiteral, Suite,
78};
79use ruff_text_size::{Ranged, TextRange};
80
81mod error;
82pub mod lexer;
83mod parser;
84pub mod semantic_errors;
85mod string;
86mod token;
87mod token_set;
88mod token_source;
89pub mod typing;
90
91/// Parse a full Python module usually consisting of multiple lines.
92///
93/// This is a convenience function that can be used to parse a full Python program without having to
94/// specify the [`Mode`] or the location. It is probably what you want to use most of the time.
95///
96/// # Example
97///
98/// For example, parsing a simple function definition and a call to that function:
99///
100/// ```
101/// use ruff_python_parser::parse_module;
102///
103/// let source = r#"
104/// def foo():
105///    return 42
106///
107/// print(foo())
108/// "#;
109///
110/// let module = parse_module(source);
111/// assert!(module.is_ok());
112/// ```
113pub fn parse_module(source: &str) -> Result<Parsed<ModModule>, ParseError> {
114    Parser::new(source, ParseOptions::from(Mode::Module))
115        .parse()
116        .try_into_module()
117        .unwrap()
118        .into_result()
119}
120
121/// Parses a single Python expression.
122///
123/// This convenience function can be used to parse a single expression without having to
124/// specify the Mode or the location.
125///
126/// # Example
127///
128/// For example, parsing a single expression denoting the addition of two numbers:
129///
130/// ```
131/// use ruff_python_parser::parse_expression;
132///
133/// let expr = parse_expression("1 + 2");
134/// assert!(expr.is_ok());
135/// ```
136pub fn parse_expression(source: &str) -> Result<Parsed<ModExpression>, ParseError> {
137    Parser::new(source, ParseOptions::from(Mode::Expression))
138        .parse()
139        .try_into_expression()
140        .unwrap()
141        .into_result()
142}
143
144/// Parses a Python expression for the given range in the source.
145///
146/// This function allows to specify the range of the expression in the source code, other than
147/// that, it behaves exactly like [`parse_expression`].
148///
149/// # Example
150///
151/// Parsing one of the numeric literal which is part of an addition expression:
152///
153/// ```
154/// use ruff_python_parser::parse_expression_range;
155/// # use ruff_text_size::{TextRange, TextSize};
156///
157/// let parsed = parse_expression_range("11 + 22 + 33", TextRange::new(TextSize::new(5), TextSize::new(7)));
158/// assert!(parsed.is_ok());
159/// ```
160pub fn parse_expression_range(
161    source: &str,
162    range: TextRange,
163) -> Result<Parsed<ModExpression>, ParseError> {
164    let source = &source[..range.end().to_usize()];
165    Parser::new_starts_at(source, range.start(), ParseOptions::from(Mode::Expression))
166        .parse()
167        .try_into_expression()
168        .unwrap()
169        .into_result()
170}
171
172/// Parses a Python expression as if it is parenthesized.
173///
174/// It behaves similarly to [`parse_expression_range`] but allows what would be valid within parenthesis
175///
176/// # Example
177///
178/// Parsing an expression that would be valid within parenthesis:
179///
180/// ```
181/// use ruff_python_parser::parse_parenthesized_expression_range;
182/// # use ruff_text_size::{TextRange, TextSize};
183///
184/// let parsed = parse_parenthesized_expression_range("'''\n int | str'''", TextRange::new(TextSize::new(3), TextSize::new(14)));
185/// assert!(parsed.is_ok());
186pub fn parse_parenthesized_expression_range(
187    source: &str,
188    range: TextRange,
189) -> Result<Parsed<ModExpression>, ParseError> {
190    let source = &source[..range.end().to_usize()];
191    let parsed = Parser::new_starts_at(
192        source,
193        range.start(),
194        ParseOptions::from(Mode::ParenthesizedExpression),
195    )
196    .parse();
197    parsed.try_into_expression().unwrap().into_result()
198}
199
200/// Parses a Python expression from a string annotation.
201///
202/// # Example
203///
204/// Parsing a string annotation:
205///
206/// ```
207/// use ruff_python_parser::parse_string_annotation;
208/// use ruff_python_ast::{StringLiteral, StringLiteralFlags, AtomicNodeIndex};
209/// use ruff_text_size::{TextRange, TextSize};
210///
211/// let string = StringLiteral {
212///     value: "'''\n int | str'''".to_string().into_boxed_str(),
213///     flags: StringLiteralFlags::empty(),
214///     range: TextRange::new(TextSize::new(0), TextSize::new(16)),
215///     node_index: AtomicNodeIndex::NONE
216/// };
217/// let parsed = parse_string_annotation("'''\n int | str'''", &string);
218/// assert!(!parsed.is_ok());
219/// ```
220pub fn parse_string_annotation(
221    source: &str,
222    string: &StringLiteral,
223) -> Result<Parsed<ModExpression>, ParseError> {
224    let range = string
225        .range()
226        .add_start(string.flags.opener_len())
227        .sub_end(string.flags.closer_len());
228    let source = &source[..range.end().to_usize()];
229    if string.flags.is_triple_quoted() {
230        parse_parenthesized_expression_range(source, range)
231    } else {
232        parse_expression_range(source, range)
233    }
234}
235
236/// Parse the given Python source code using the specified [`ParseOptions`].
237///
238/// This function is the most general function to parse Python code. Based on the [`Mode`] supplied
239/// via the [`ParseOptions`], it can be used to parse a single expression, a full Python program,
240/// an interactive expression or a Python program containing IPython escape commands.
241///
242/// # Example
243///
244/// If we want to parse a simple expression, we can use the [`Mode::Expression`] mode during
245/// parsing:
246///
247/// ```
248/// use ruff_python_parser::{parse, Mode, ParseOptions};
249///
250/// let parsed = parse("1 + 2", ParseOptions::from(Mode::Expression));
251/// assert!(parsed.is_ok());
252/// ```
253///
254/// Alternatively, we can parse a full Python program consisting of multiple lines:
255///
256/// ```
257/// use ruff_python_parser::{parse, Mode, ParseOptions};
258///
259/// let source = r#"
260/// class Greeter:
261///
262///   def greet(self):
263///    print("Hello, world!")
264/// "#;
265/// let parsed = parse(source, ParseOptions::from(Mode::Module));
266/// assert!(parsed.is_ok());
267/// ```
268///
269/// Additionally, we can parse a Python program containing IPython escapes:
270///
271/// ```
272/// use ruff_python_parser::{parse, Mode, ParseOptions};
273///
274/// let source = r#"
275/// %timeit 1 + 2
276/// ?str.replace
277/// !ls
278/// "#;
279/// let parsed = parse(source, ParseOptions::from(Mode::Ipython));
280/// assert!(parsed.is_ok());
281/// ```
282pub fn parse(source: &str, options: ParseOptions) -> Result<Parsed<Mod>, ParseError> {
283    parse_unchecked(source, options).into_result()
284}
285
286/// Parse the given Python source code using the specified [`ParseOptions`].
287///
288/// This is same as the [`parse`] function except that it doesn't check for any [`ParseError`]
289/// and returns the [`Parsed`] as is.
290pub fn parse_unchecked(source: &str, options: ParseOptions) -> Parsed<Mod> {
291    Parser::new(source, options).parse()
292}
293
294/// Parse the given Python source code using the specified [`PySourceType`].
295pub fn parse_unchecked_source(source: &str, source_type: PySourceType) -> Parsed<ModModule> {
296    // SAFETY: Safe because `PySourceType` always parses to a `ModModule`
297    Parser::new(source, ParseOptions::from(source_type))
298        .parse()
299        .try_into_module()
300        .unwrap()
301}
302
303/// Represents the parsed source code.
304#[derive(Debug, PartialEq, Clone, get_size2::GetSize)]
305pub struct Parsed<T> {
306    syntax: T,
307    tokens: Tokens,
308    errors: Vec<ParseError>,
309    unsupported_syntax_errors: Vec<UnsupportedSyntaxError>,
310}
311
312impl<T> Parsed<T> {
313    /// Returns the syntax node represented by this parsed output.
314    pub fn syntax(&self) -> &T {
315        &self.syntax
316    }
317
318    /// Returns all the tokens for the parsed output.
319    pub fn tokens(&self) -> &Tokens {
320        &self.tokens
321    }
322
323    /// Returns a list of syntax errors found during parsing.
324    pub fn errors(&self) -> &[ParseError] {
325        &self.errors
326    }
327
328    /// Returns a list of version-related syntax errors found during parsing.
329    pub fn unsupported_syntax_errors(&self) -> &[UnsupportedSyntaxError] {
330        &self.unsupported_syntax_errors
331    }
332
333    /// Consumes the [`Parsed`] output and returns the contained syntax node.
334    pub fn into_syntax(self) -> T {
335        self.syntax
336    }
337
338    /// Consumes the [`Parsed`] output and returns a list of syntax errors found during parsing.
339    pub fn into_errors(self) -> Vec<ParseError> {
340        self.errors
341    }
342
343    /// Returns `true` if the parsed source code is valid i.e., it has no [`ParseError`]s.
344    ///
345    /// Note that this does not include version-related [`UnsupportedSyntaxError`]s.
346    ///
347    /// See [`Parsed::has_no_syntax_errors`] for a version that takes these into account.
348    pub fn has_valid_syntax(&self) -> bool {
349        self.errors.is_empty()
350    }
351
352    /// Returns `true` if the parsed source code is invalid i.e., it has [`ParseError`]s.
353    ///
354    /// Note that this does not include version-related [`UnsupportedSyntaxError`]s.
355    ///
356    /// See [`Parsed::has_no_syntax_errors`] for a version that takes these into account.
357    pub fn has_invalid_syntax(&self) -> bool {
358        !self.has_valid_syntax()
359    }
360
361    /// Returns `true` if the parsed source code does not contain any [`ParseError`]s *or*
362    /// [`UnsupportedSyntaxError`]s.
363    ///
364    /// See [`Parsed::has_valid_syntax`] for a version specific to [`ParseError`]s.
365    pub fn has_no_syntax_errors(&self) -> bool {
366        self.has_valid_syntax() && self.unsupported_syntax_errors.is_empty()
367    }
368
369    /// Returns `true` if the parsed source code contains any [`ParseError`]s *or*
370    /// [`UnsupportedSyntaxError`]s.
371    ///
372    /// See [`Parsed::has_invalid_syntax`] for a version specific to [`ParseError`]s.
373    pub fn has_syntax_errors(&self) -> bool {
374        !self.has_no_syntax_errors()
375    }
376
377    /// Returns the [`Parsed`] output as a [`Result`], returning [`Ok`] if it has no syntax errors,
378    /// or [`Err`] containing the first [`ParseError`] encountered.
379    ///
380    /// Note that any [`unsupported_syntax_errors`](Parsed::unsupported_syntax_errors) will not
381    /// cause [`Err`] to be returned.
382    pub fn as_result(&self) -> Result<&Parsed<T>, &[ParseError]> {
383        if self.has_valid_syntax() {
384            Ok(self)
385        } else {
386            Err(&self.errors)
387        }
388    }
389
390    /// Consumes the [`Parsed`] output and returns a [`Result`] which is [`Ok`] if it has no syntax
391    /// errors, or [`Err`] containing the first [`ParseError`] encountered.
392    ///
393    /// Note that any [`unsupported_syntax_errors`](Parsed::unsupported_syntax_errors) will not
394    /// cause [`Err`] to be returned.
395    pub(crate) fn into_result(self) -> Result<Parsed<T>, ParseError> {
396        if self.has_valid_syntax() {
397            Ok(self)
398        } else {
399            Err(self.into_errors().into_iter().next().unwrap())
400        }
401    }
402}
403
404impl Parsed<Mod> {
405    /// Attempts to convert the [`Parsed<Mod>`] into a [`Parsed<ModModule>`].
406    ///
407    /// This method checks if the `syntax` field of the output is a [`Mod::Module`]. If it is, the
408    /// method returns [`Some(Parsed<ModModule>)`] with the contained module. Otherwise, it
409    /// returns [`None`].
410    ///
411    /// [`Some(Parsed<ModModule>)`]: Some
412    pub fn try_into_module(self) -> Option<Parsed<ModModule>> {
413        match self.syntax {
414            Mod::Module(module) => Some(Parsed {
415                syntax: module,
416                tokens: self.tokens,
417                errors: self.errors,
418                unsupported_syntax_errors: self.unsupported_syntax_errors,
419            }),
420            Mod::Expression(_) => None,
421        }
422    }
423
424    /// Attempts to convert the [`Parsed<Mod>`] into a [`Parsed<ModExpression>`].
425    ///
426    /// This method checks if the `syntax` field of the output is a [`Mod::Expression`]. If it is,
427    /// the method returns [`Some(Parsed<ModExpression>)`] with the contained expression.
428    /// Otherwise, it returns [`None`].
429    ///
430    /// [`Some(Parsed<ModExpression>)`]: Some
431    pub fn try_into_expression(self) -> Option<Parsed<ModExpression>> {
432        match self.syntax {
433            Mod::Module(_) => None,
434            Mod::Expression(expression) => Some(Parsed {
435                syntax: expression,
436                tokens: self.tokens,
437                errors: self.errors,
438                unsupported_syntax_errors: self.unsupported_syntax_errors,
439            }),
440        }
441    }
442}
443
444impl Parsed<ModModule> {
445    /// Returns the module body contained in this parsed output as a [`Suite`].
446    pub fn suite(&self) -> &Suite {
447        &self.syntax.body
448    }
449
450    /// Consumes the [`Parsed`] output and returns the module body as a [`Suite`].
451    pub fn into_suite(self) -> Suite {
452        self.syntax.body
453    }
454}
455
456impl Parsed<ModExpression> {
457    /// Returns the expression contained in this parsed output.
458    pub fn expr(&self) -> &Expr {
459        &self.syntax.body
460    }
461
462    /// Returns a mutable reference to the expression contained in this parsed output.
463    pub fn expr_mut(&mut self) -> &mut Expr {
464        &mut self.syntax.body
465    }
466
467    /// Consumes the [`Parsed`] output and returns the contained [`Expr`].
468    pub fn into_expr(self) -> Expr {
469        *self.syntax.body
470    }
471}
472
473/// Control in the different modes by which a source file can be parsed.
474///
475/// The mode argument specifies in what way code must be parsed.
476#[derive(Clone, Copy, Debug, Hash, PartialEq, Eq)]
477pub enum Mode {
478    /// The code consists of a sequence of statements.
479    Module,
480
481    /// The code consists of a single expression.
482    Expression,
483
484    /// The code consists of a single expression and is parsed as if it is parenthesized. The parentheses themselves aren't required.
485    /// This allows for having valid multiline expression without the need of parentheses
486    /// and is specifically useful for parsing string annotations.
487    ParenthesizedExpression,
488
489    /// The code consists of a sequence of statements which can include the
490    /// escape commands that are part of IPython syntax.
491    ///
492    /// ## Supported escape commands:
493    ///
494    /// - [Magic command system] which is limited to [line magics] and can start
495    ///   with `?` or `??`.
496    /// - [Dynamic object information] which can start with `?` or `??`.
497    /// - [System shell access] which can start with `!` or `!!`.
498    /// - [Automatic parentheses and quotes] which can start with `/`, `;`, or `,`.
499    ///
500    /// [Magic command system]: https://ipython.readthedocs.io/en/stable/interactive/reference.html#magic-command-system
501    /// [line magics]: https://ipython.readthedocs.io/en/stable/interactive/magics.html#line-magics
502    /// [Dynamic object information]: https://ipython.readthedocs.io/en/stable/interactive/reference.html#dynamic-object-information
503    /// [System shell access]: https://ipython.readthedocs.io/en/stable/interactive/reference.html#system-shell-access
504    /// [Automatic parentheses and quotes]: https://ipython.readthedocs.io/en/stable/interactive/reference.html#automatic-parentheses-and-quotes
505    Ipython,
506}
507
508impl std::str::FromStr for Mode {
509    type Err = ModeParseError;
510    fn from_str(s: &str) -> Result<Self, ModeParseError> {
511        match s {
512            "exec" | "single" => Ok(Mode::Module),
513            "eval" => Ok(Mode::Expression),
514            "ipython" => Ok(Mode::Ipython),
515            _ => Err(ModeParseError),
516        }
517    }
518}
519
520/// A type that can be represented as [Mode].
521pub trait AsMode {
522    fn as_mode(&self) -> Mode;
523}
524
525impl AsMode for PySourceType {
526    fn as_mode(&self) -> Mode {
527        match self {
528            PySourceType::Python | PySourceType::Stub => Mode::Module,
529            PySourceType::Ipynb => Mode::Ipython,
530        }
531    }
532}
533
534/// Returned when a given mode is not valid.
535#[derive(Debug)]
536pub struct ModeParseError;
537
538impl std::fmt::Display for ModeParseError {
539    fn fmt(&self, f: &mut std::fmt::Formatter) -> std::fmt::Result {
540        write!(f, r#"mode must be "exec", "eval", "ipython", or "single""#)
541    }
542}