Type Alias ruby_prism_sys::pm_parser_t

source ·

pub type pm_parser_t = pm_parser;

Expand description

The parser used to parse Ruby source.

Aliased Type§

struct pm_parser_t {Show 40 fields
    pub lex_state: u32,
    pub enclosure_nesting: i32,
    pub lambda_enclosure_nesting: i32,
    pub brace_nesting: i32,
    pub do_loop_stack: u32,
    pub accepts_block_stack: u32,
    pub lex_modes: pm_parser__bindgen_ty_1,
    pub start: *const u8,
    pub end: *const u8,
    pub previous: pm_token_t,
    pub current: pm_token_t,
    pub next_start: *const u8,
    pub heredoc_end: *const u8,
    pub comment_list: pm_list_t,
    pub magic_comment_list: pm_list_t,
    pub data_loc: pm_location_t,
    pub warning_list: pm_list_t,
    pub error_list: pm_list_t,
    pub current_scope: *mut pm_scope,
    pub current_context: *mut pm_context_node,
    pub encoding: *const pm_encoding_t,
    pub encoding_changed_callback: Option<unsafe extern "C" fn(_: *mut pm_parser)>,
    pub encoding_comment_start: *const u8,
    pub lex_callback: *mut pm_lex_callback_t,
    pub filepath: pm_string_t,
    pub constant_pool: pm_constant_pool_t,
    pub newline_list: pm_newline_list_t,
    pub integer_base: u16,
    pub current_string: pm_string_t,
    pub start_line: i32,
    pub explicit_encoding: *const pm_encoding_t,
    pub current_param_name: u32,
    pub version: u32,
    pub command_start: bool,
    pub recovering: bool,
    pub encoding_changed: bool,
    pub pattern_matching_newlines: bool,
    pub in_keyword_arg: bool,
    pub semantic_token_seen: bool,
    pub frozen_string_literal: bool,
}

Fields§

§lex_state: u32

The current state of the lexer.

§enclosure_nesting: i32

Tracks the current nesting of (), [], and {}.

§lambda_enclosure_nesting: i32

Used to temporarily track the nesting of enclosures to determine if a { is the beginning of a lambda following the parameters of a lambda.

§brace_nesting: i32

Used to track the nesting of braces to ensure we get the correct value when we are interpolating blocks with braces.

§do_loop_stack: u32

The stack used to determine if a do keyword belongs to the predicate of a while, until, or for loop.

§accepts_block_stack: u32

The stack used to determine if a do keyword belongs to the beginning of a block.

§lex_modes: pm_parser__bindgen_ty_1§start: *const u8

The pointer to the start of the source.

§end: *const u8

The pointer to the end of the source.

§previous: pm_token_t

The previous token we were considering.

§current: pm_token_t

The current token we’re considering.

§next_start: *const u8

This is a special field set on the parser when we need the parser to jump to a specific location when lexing the next token, as opposed to just using the end of the previous token. Normally this is NULL.

§heredoc_end: *const u8

This field indicates the end of a heredoc whose identifier was found on the current line. If another heredoc is found on the same line, then this will be moved forward to the end of that heredoc. If no heredocs are found on a line then this is NULL.

§comment_list: pm_list_t

The list of comments that have been found while parsing.

§magic_comment_list: pm_list_t

The list of magic comments that have been found while parsing.

§data_loc: pm_location_t

An optional location that represents the location of the END marker and the rest of the content of the file. This content is loaded into the DATA constant when the file being parsed is the main file being executed.

§warning_list: pm_list_t

The list of warnings that have been found while parsing.

§error_list: pm_list_t

The list of errors that have been found while parsing.

§current_scope: *mut pm_scope

The current local scope.

§current_context: *mut pm_context_node

The current parsing context.

§encoding: *const pm_encoding_t

The encoding functions for the current file is attached to the parser as it’s parsing so that it can change with a magic comment.

§encoding_changed_callback: Option<unsafe extern "C" fn(_: *mut pm_parser)>

When the encoding that is being used to parse the source is changed by prism, we provide the ability here to call out to a user-defined function.

§encoding_comment_start: *const u8

This pointer indicates where a comment must start if it is to be considered an encoding comment.

§lex_callback: *mut pm_lex_callback_t

This is an optional callback that can be attached to the parser that will be called whenever a new token is lexed by the parser.

§filepath: pm_string_t

This is the path of the file being parsed. We use the filepath when constructing SourceFileNodes.

§constant_pool: pm_constant_pool_t

This constant pool keeps all of the constants defined throughout the file so that we can reference them later.

§newline_list: pm_newline_list_t

This is the list of newline offsets in the source file.

§integer_base: u16

We want to add a flag to integer nodes that indicates their base. We only want to parse these once, but we don’t have space on the token itself to communicate this information. So we store it here and pass it through when we find tokens that we need it for.

§current_string: pm_string_t

This string is used to pass information from the lexer to the parser. It is particularly necessary because of escape sequences.

§start_line: i32

The line number at the start of the parse. This will be used to offset the line numbers of all of the locations.

§explicit_encoding: *const pm_encoding_t

When a string-like expression is being lexed, any byte or escape sequence that resolves to a value whose top bit is set (i.e., >= 0x80) will explicitly set the encoding to the same encoding as the source. Alternatively, if a unicode escape sequence is used (e.g., \u{80}) that resolves to a value whose top bit is set, then the encoding will be explicitly set to UTF-8.

The next time this happens, if the encoding that is about to become the explicitly set encoding does not match the previously set explicit encoding, a mixed encoding error will be emitted.

When the expression is finished being lexed, the explicit encoding controls the encoding of the expression. For the most part this means that the expression will either be encoded in the source encoding or UTF-8. This holds for all encodings except US-ASCII. If the source is US-ASCII and an explicit encoding was set that was not UTF-8, then the expression will be encoded as ASCII-8BIT.

Note that if the expression is a list, different elements within the same list can have different encodings, so this will get reset between each element. Furthermore all of this only applies to lists that support interpolation, because otherwise escapes that could change the encoding are ignored.

At first glance, it may make more sense for this to live on the lexer mode, but we need it here to communicate back to the parser for character literals that do not push a new lexer mode.

§current_param_name: u32

The current parameter name id on parsing its default value.

§version: u32

The version of prism that we should use to parse.

§command_start: bool

Whether or not we’re at the beginning of a command.

§recovering: bool

Whether or not we’re currently recovering from a syntax error.

§encoding_changed: bool

Whether or not the encoding has been changed by a magic comment. We use this to provide a fast path for the lexer instead of going through the function pointer.

§pattern_matching_newlines: bool

This flag indicates that we are currently parsing a pattern matching expression and impacts that calculation of newlines.

§in_keyword_arg: bool

This flag indicates that we are currently parsing a keyword argument.

§semantic_token_seen: bool

Whether or not the parser has seen a token that has semantic meaning (i.e., a token that is not a comment or whitespace).

§frozen_string_literal: bool

Whether or not we have found a frozen_string_literal magic comment with a true value.