Struct Parser

Source

pub struct Parser<I, B> { /* private fields */ }

Expand description

A parser for the shell language. It will parse shell commands from a stream of shell Tokens, and pass them to an AST builder.

The parser implements the IntoIterator trait so that it can behave like a stream of parsed shell commands. Converting the parser into an Iterator and calling next() on the result will yield a complete shell command, or an error should one arise.

§Building

To construct a parser you need a stream of Tokens and a Builder which will receive data from the parser and assemble an AST. This library provides both a default Token lexer, as well as an AST Builder.

use conch_parser::ast::builder::{Builder, RcBuilder};
use conch_parser::lexer::Lexer;
use conch_parser::parse::Parser;

let source = "echo hello world";
let lexer = Lexer::new(source.chars());
let mut parser = Parser::with_builder(lexer, RcBuilder::new());
assert!(parser.complete_command().unwrap().is_some());

If you want to use a parser with the default AST builder implementation you can also use the DefaultParser type alias for a simpler setup.

use conch_parser::lexer::Lexer;
use conch_parser::parse::DefaultParser;

let source = "echo hello world";
let lexer = Lexer::new(source.chars());
let mut parser = DefaultParser::new(lexer);
assert!(parser.complete_command().unwrap().is_some());

§Token lexing

Lexer implementations are free to yield tokens in whatever manner they wish, however, there are a few considerations the lexer should take.

First, the lexer should consolidate consecutive tokens such as Token::Name, Token::Literal, and Token::Whitespace as densely as possible, e.g. Literal(foobar) is preferred over [Literal(foo), Literal(bar)]. Although such splitting of tokens will not cause problems while parsing most shell commands, certain situations require the parser to look-ahead some fixed number of tokens so it can avoid backtracking. When the tokens are consolidated the parser can look-ahead deterministically. If a lexer implementation chooses not to use this strategy, the parser may unsuccessfully parse certain inputs normally considered valid.

Second, the lexer can influence how token escaping is handled by the parser. The backslash token, \ is used to escape, or make literal, any token which may or may not have a special meaning. Since the parser operates on tokens and not characters, the escaping of multi-character tokens is affected by how the lexer yields them. For example, the source \<< is normally considered by shells as [Literal(<), Less]. If this behavior is desired, the lexer should yield the tokens [Backslash, Less, Less] for that source. Otherwise if the lexer yields the tokens [Backslash, DLess], the parser will treat the source as if it were [Literal(<<)]. The lexer’s behavior need not be consistent between different multi-char tokens, as long as it is aware of the implications.

Implementations§

Source §

impl<I: Iterator<Item = Token>, B: Builder + Default> Parser<I, B>

Source

pub fn new<T>(iter: T) -> Parser<I, B>
where T: IntoIterator<Item = Token, IntoIter = I>,

Creates a new Parser from a Token iterator or collection.

Source §

impl<I: Iterator<Item = Token>, B: Builder> Parser<I, B>

Source

pub fn with_builder(iter: I, builder: B) -> Self

Creates a new Parser from a Token iterator and provided AST builder.

Source

pub fn pos(&self) -> SourcePos

Returns the parser’s current position in the source.

Source

pub fn complete_command(&mut self) -> ParseResult<Option<B::Command>, B::Error>

Parses a single complete command.

For example, foo && bar; baz will yield two complete commands: And(foo, bar), and Simple(baz).

Source

pub fn and_or_list(&mut self) -> ParseResult<B::CommandList, B::Error>

Parses compound AND/OR commands.

Commands are left associative. For example foo || bar && baz parses to And(Or(foo, bar), baz).

Source

pub fn pipeline(&mut self) -> ParseResult<B::ListableCommand, B::Error>

Parses either a single command or a pipeline of commands.

For example [!] foo | bar.

Source

pub fn command(&mut self) -> ParseResult<B::PipeableCommand, B::Error>

Parses any compound or individual command.

Source

pub fn simple_command(&mut self) -> ParseResult<B::PipeableCommand, B::Error>

Tries to parse a simple command, e.g. cmd arg1 arg2 >redirect.

A valid command is expected to have at least an executable name, or a single variable assignment or redirection. Otherwise an error will be returned.

Source

pub fn redirect_list(&mut self) -> ParseResult<Vec<B::Redirect>, B::Error>

Parses a continuous list of redirections and will error if any words that are not valid file descriptors are found. Essentially used for parsing redirection lists after a compound command like while or if.

Source

pub fn redirect( &mut self, ) -> ParseResult<Option<Result<B::Redirect, B::Word>>, B::Error>

Parses a redirection token an any source file descriptor and path/destination descriptor as appropriate, e.g. >out, 1>& 2, or 2>&-.

Since the source descriptor can be any arbitrarily complicated word, it makes it difficult to reliably peek forward whether a valid redirection exists without consuming anything. Thus this method may return a simple word if no redirection is found.

Thus, unless a parse error is occured, the return value will be an optional redirect or word if either is found. In other words, Ok(Some(Ok(redirect))) will result if a redirect is found, Ok(Some(Err(word))) if a word is found, or Ok(None) if neither is found.

Source

pub fn redirect_heredoc( &mut self, src_fd: Option<u16>, ) -> ParseResult<B::Redirect, B::Error>

Parses a heredoc redirection and the heredoc’s body.

This method will look ahead after the next unquoted/unescaped newline to capture the heredoc’s body, and will stop consuming tokens until the approrpiate delimeter is found on a line by itself. If the delimeter is unquoted, the heredoc’s body will be expanded for parameters and other special words. Otherwise, there heredoc’s body will be treated as a literal.

The heredoc delimeter need not be a valid word (e.g. parameter subsitution rules within ${ } need not apply), although it is expected to be balanced like a regular word. In other words, all single/double quotes, backticks, ${ }, $( ), and ( ) must be balanced.

Note: if the delimeter is quoted, this method will look for an UNQUOTED version in the body. For example <<"EOF" will cause the parser to look until \nEOF is found. This it is possible to create a heredoc that can only be delimited by the end of the stream, e.g. if a newline is embedded within the delimeter. Any backticks that appear in the delimeter are expected to appear at the end of the delimeter of the heredoc body, as well as any embedded backslashes (unless the backslashes are followed by a , $, or `).

Note: this method expects that the caller provide a potential file descriptor for redirection.

Source

pub fn word(&mut self) -> ParseResult<Option<B::Word>, B::Error>

Parses a whitespace delimited chunk of text, honoring space quoting rules, and skipping leading and trailing whitespace.

Since there are a large number of possible tokens that constitute a word, (such as literals, paramters, variables, etc.) the caller may not know for sure whether to expect a word, thus an optional result is returned in the event that no word exists.

Note that an error can still arise if partial tokens are present (e.g. malformed parameter).

Source

pub fn word_preserve_trailing_whitespace( &mut self, ) -> ParseResult<Option<B::Word>, B::Error>

Identical to Parser::word() but preserves trailing whitespace after the word.

Source

pub fn backticked_command_substitution( &mut self, ) -> ParseResult<B::Word, B::Error>

Parses a command subsitution in the form `cmd`.

Any backslashes that are immediately followed by , $, or ` are removed before the contents inside the original backticks are recursively parsed as a command.

Source

pub fn parameter(&mut self) -> ParseResult<B::Word, B::Error>

Parses a parameters such as $$, $1, $foo, etc, or parameter substitutions such as $(cmd), ${param-word}, etc.

Since it is possible that a leading $ is not followed by a valid parameter, the $ should be treated as a literal. Thus this method returns an Word, which will capture both cases where a literal or parameter is parsed.

Source

pub fn do_group(&mut self) -> ParseResult<CommandGroup<B::Command>, B::Error>

Parses any number of sequential commands between the do and done reserved words. Each of the reserved words must be a literal token, and cannot be quoted or concatenated.

Source

pub fn brace_group(&mut self) -> ParseResult<CommandGroup<B::Command>, B::Error>

Parses any number of sequential commands between balanced { and } reserved words. Each of the reserved words must be a literal token, and cannot be quoted.

Source

pub fn subshell(&mut self) -> ParseResult<CommandGroup<B::Command>, B::Error>

Parses any number of sequential commands between balanced ( and ).

It is considered an error if no commands are present inside the subshell.

Source

pub fn compound_command(&mut self) -> ParseResult<B::CompoundCommand, B::Error>

Parses compound commands like for, case, if, while, until, brace groups, or subshells, including any redirection lists to be applied to them.

Source

pub fn loop_command( &mut self, ) -> ParseResult<(LoopKind, GuardBodyPairGroup<B::Command>), B::Error>

Parses loop commands like while and until but does not parse any redirections that may follow.

Since they are compound commands (and can have redirections applied to the entire loop) this method returns the relevant parts of the loop command, without constructing an AST node, it so that the caller can do so with redirections.

Source

pub fn if_command(&mut self) -> ParseResult<IfFragments<B::Command>, B::Error>

Parses a single if command but does not parse any redirections that may follow.

Since if is a compound command (and can have redirections applied to it) this method returns the relevant parts of the if command, without constructing an AST node, it so that the caller can do so with redirections.

Source

pub fn for_command( &mut self, ) -> ParseResult<ForFragments<B::Word, B::Command>, B::Error>

Parses a single for command but does not parse any redirections that may follow.

Since for is a compound command (and can have redirections applied to it) this method returns the relevant parts of the for command, without constructing an AST node, it so that the caller can do so with redirections.

Source

pub fn case_command( &mut self, ) -> ParseResult<CaseFragments<B::Word, B::Command>, B::Error>

Parses a single case command but does not parse any redirections that may follow.

Since case is a compound command (and can have redirections applied to it) this method returns the relevant parts of the case command, without constructing an AST node, it so that the caller can do so with redirections.

Source

pub fn maybe_function_declaration( &mut self, ) -> ParseResult<Option<B::PipeableCommand>, B::Error>

Parses a single function declaration if present. If no function is present, nothing is consumed from the token stream.

Source

pub fn function_declaration( &mut self, ) -> ParseResult<B::PipeableCommand, B::Error>

Parses a single function declaration.

A function declaration must either begin with the function reserved word, or the name of the function must be followed by (). Whitespace is allowed between the name and (, and whitespace is allowed between ().

Source

pub fn skip_whitespace(&mut self)

Skips over any encountered whitespace but preserves newlines.

Source

pub fn linebreak(&mut self) -> Vec<Newline>

Parses zero or more Token::Newlines, skipping whitespace but capturing comments.

Source

pub fn newline(&mut self) -> Option<Newline>

Tries to parse a Token::Newline (or a comment) after skipping whitespace.

Source

pub fn peek_reserved_token<'a>( &mut self, tokens: &'a [Token], ) -> Option<&'a Token>

Checks that one of the specified tokens appears as a reserved word.

The token must be followed by a token which delimits a word when it is unquoted/unescaped.

If a reserved word is found, the token which it matches will be returned in case the caller cares which specific reserved word was found.

Source

pub fn peek_reserved_word<'a>(&mut self, words: &'a [&str]) -> Option<&'a str>

Checks that one of the specified strings appears as a reserved word.

The word must appear as a single token, unquoted and unescaped, and must be followed by a token which delimits a word when it is unquoted/unescaped. The reserved word may appear as a Token::Name or a Token::Literal.

If a reserved word is found, the string which it matches will be returned in case the caller cares which specific reserved word was found.

Source

pub fn reserved_token( &mut self, tokens: &[Token], ) -> ParseResult<Token, B::Error>

Checks that one of the specified tokens appears as a reserved word and consumes it, returning the token it matched in case the caller cares which specific reserved word was found.

Source

pub fn reserved_word<'a>(&mut self, words: &'a [&str]) -> Result<&'a str, ()>

Checks that one of the specified strings appears as a reserved word and consumes it, returning the string it matched in case the caller cares which specific reserved word was found.

Source

pub fn command_group( &mut self, cfg: CommandGroupDelimiters<'_, '_, '_>, ) -> ParseResult<CommandGroup<B::Command>, B::Error>

Parses commands until a configured delimeter (or EOF) is reached, without consuming the token or reserved word.

Any reserved word/token must appear after a complete command separator (e.g. ;, &, or a newline), otherwise it will be parsed as part of the command.

It is considered an error if no commands are present.

Source