Struct conch_parser::parse::Parser [] [src]

pub struct Parser<I, B> { /* fields omitted */ }

A parser for the shell language. It will parse shell commands from a stream of shell Tokens, and pass them to an AST builder.

The parser implements the IntoIterator trait so that it can behave like a stream of parsed shell commands. Converting the parser into an Iterator and calling next() on the result will yield a complete shell command, or an error should one arise.

Building

To construct a parser you need a stream of Tokens and a Builder which will receive data from the parser and assemble an AST. This library provides both a default Token lexer, as well as an AST Builder.

use conch_parser::ast::builder::{Builder, RcBuilder};
use conch_parser::lexer::Lexer;
use conch_parser::parse::Parser;

let source = "echo hello world";
let lexer = Lexer::new(source.chars());
let mut parser = Parser::with_builder(lexer, RcBuilder::new());
assert!(parser.complete_command().unwrap().is_some());

If you want to use a parser with the default AST builder implementation you can also use the DefaultParser type alias for a simpler setup.

use conch_parser::lexer::Lexer;
use conch_parser::parse::DefaultParser;

let source = "echo hello world";
let lexer = Lexer::new(source.chars());
let mut parser = DefaultParser::new(lexer);
assert!(parser.complete_command().unwrap().is_some());

Token lexing

Lexer implementations are free to yield tokens in whatever manner they wish, however, there are a few considerations the lexer should take.

First, the lexer should consolidate consecutive tokens such as Token::Name, Token::Literal, and Token::Whitespace as densely as possible, e.g. Literal(foobar) is preferred over [Literal(foo), Literal(bar)]. Although such splitting of tokens will not cause problems while parsing most shell commands, certain situations require the parser to look-ahead some fixed number of tokens so it can avoid backtracking. When the tokens are consolidated the parser can look-ahead deterministically. If a lexer implementation chooses not to use this strategy, the parser may unsuccessfully parse certain inputs normally considered valid.

Second, the lexer can influence how token escaping is handled by the parser. The backslash token, \ is used to escape, or make literal, any token which may or may not have a special meaning. Since the parser operates on tokens and not characters, the escaping of multi-character tokens is affected by how the lexer yields them. For example, the source \<< is normally considered by shells as [Literal(<), Less]. If this behavior is desired, the lexer should yield the tokens [Backslash, Less, Less] for that source. Otherwise if the lexer yields the tokens [Backslash, DLess], the parser will treat the source as if it were [Literal(<<)]. The lexer's behavior need not be consistent between different multi-char tokens, as long as it is aware of the implications.

Methods

impl<I: Iterator<Item = Token>, B: Builder + Default> Parser<I, B>
[src]

Creates a new Parser from a Token iterator or collection.

impl<I: Iterator<Item = Token>, B: Builder> Parser<I, B>
[src]

Creates a new Parser from a Token iterator and provided AST builder.

Returns the parser's current position in the source.

Parses a single complete command.

For example, foo && bar; baz will yield two complete commands: And(foo, bar), and Simple(baz).

Parses compound AND/OR commands.

Commands are left associative. For example foo || bar && baz parses to And(Or(foo, bar), baz).

Parses either a single command or a pipeline of commands.

For example [!] foo | bar.

Parses any compound or individual command.

Tries to parse a simple command, e.g. cmd arg1 arg2 >redirect.

A valid command is expected to have at least an executable name, or a single variable assignment or redirection. Otherwise an error will be returned.

Parses a continuous list of redirections and will error if any words that are not valid file descriptors are found. Essentially used for parsing redirection lists after a compound command like while or if.

Parses a redirection token an any source file descriptor and path/destination descriptor as appropriate, e.g. >out, 1>& 2, or 2>&-.

Since the source descriptor can be any arbitrarily complicated word, it makes it difficult to reliably peek forward whether a valid redirection exists without consuming anything. Thus this method may return a simple word if no redirection is found.

Thus, unless a parse error is occured, the return value will be an optional redirect or word if either is found. In other words, Ok(Some(Ok(redirect))) will result if a redirect is found, Ok(Some(Err(word))) if a word is found, or Ok(None) if neither is found.

Parses a heredoc redirection and the heredoc's body.

This method will look ahead after the next unquoted/unescaped newline to capture the heredoc's body, and will stop consuming tokens until the approrpiate delimeter is found on a line by itself. If the delimeter is unquoted, the heredoc's body will be expanded for parameters and other special words. Otherwise, there heredoc's body will be treated as a literal.

The heredoc delimeter need not be a valid word (e.g. parameter subsitution rules within ${ } need not apply), although it is expected to be balanced like a regular word. In other words, all single/double quotes, backticks, ${ }, $( ), and ( ) must be balanced.

Note: if the delimeter is quoted, this method will look for an UNQUOTED version in the body. For example <<"EOF" will cause the parser to look until \nEOF is found. This it is possible to create a heredoc that can only be delimited by the end of the stream, e.g. if a newline is embedded within the delimeter. Any backticks that appear in the delimeter are expected to appear at the end of the delimeter of the heredoc body, as well as any embedded backslashes (unless the backslashes are followed by a \, $, or `).

Note: this method expects that the caller provide a potential file descriptor for redirection.

Parses a whitespace delimited chunk of text, honoring space quoting rules, and skipping leading and trailing whitespace.

Since there are a large number of possible tokens that constitute a word, (such as literals, paramters, variables, etc.) the caller may not know for sure whether to expect a word, thus an optional result is returned in the event that no word exists.

Note that an error can still arise if partial tokens are present (e.g. malformed parameter).

Identical to Parser::word() but preserves trailing whitespace after the word.

Parses a command subsitution in the form `cmd`.

Any backslashes that are immediately followed by \, $, or ` are removed before the contents inside the original backticks are recursively parsed as a command.

Parses a parameters such as $$, $1, $foo, etc, or parameter substitutions such as $(cmd), ${param-word}, etc.

Since it is possible that a leading $ is not followed by a valid parameter, the $ should be treated as a literal. Thus this method returns an Word, which will capture both cases where a literal or parameter is parsed.

Parses any number of sequential commands between the do and done reserved words. Each of the reserved words must be a literal token, and cannot be quoted or concatenated.

Parses any number of sequential commands between balanced { and } reserved words. Each of the reserved words must be a literal token, and cannot be quoted.

Parses any number of sequential commands between balanced ( and ).

It is considered an error if no commands are present inside the subshell.

Parses compound commands like for, case, if, while, until, brace groups, or subshells, including any redirection lists to be applied to them.

Parses loop commands like while and until but does not parse any redirections that may follow.

Since they are compound commands (and can have redirections applied to the entire loop) this method returns the relevant parts of the loop command, without constructing an AST node, it so that the caller can do so with redirections.

Parses a single if command but does not parse any redirections that may follow.

Since if is a compound command (and can have redirections applied to it) this method returns the relevant parts of the if command, without constructing an AST node, it so that the caller can do so with redirections.

Parses a single for command but does not parse any redirections that may follow.

Since for is a compound command (and can have redirections applied to it) this method returns the relevant parts of the for command, without constructing an AST node, it so that the caller can do so with redirections.

Parses a single case command but does not parse any redirections that may follow.

Since case is a compound command (and can have redirections applied to it) this method returns the relevant parts of the case command, without constructing an AST node, it so that the caller can do so with redirections.

Parses a single function declaration if present. If no function is present, nothing is consumed from the token stream.

Parses a single function declaration.

A function declaration must either begin with the function reserved word, or the name of the function must be followed by (). Whitespace is allowed between the name and (, and whitespace is allowed between ().

Skips over any encountered whitespace but preserves newlines.

Parses zero or more Token::Newlines, skipping whitespace but capturing comments.

Tries to parse a Token::Newline (or a comment) after skipping whitespace.

Checks that one of the specified tokens appears as a reserved word.

The token must be followed by a token which delimits a word when it is unquoted/unescaped.

If a reserved word is found, the token which it matches will be returned in case the caller cares which specific reserved word was found.

Checks that one of the specified strings appears as a reserved word.

The word must appear as a single token, unquoted and unescaped, and must be followed by a token which delimits a word when it is unquoted/unescaped. The reserved word may appear as a Token::Name or a Token::Literal.

If a reserved word is found, the string which it matches will be returned in case the caller cares which specific reserved word was found.

Checks that one of the specified tokens appears as a reserved word and consumes it, returning the token it matched in case the caller cares which specific reserved word was found.

Checks that one of the specified strings appears as a reserved word and consumes it, returning the string it matched in case the caller cares which specific reserved word was found.

Parses commands until a configured delimeter (or EOF) is reached, without consuming the token or reserved word.

Any reserved word/token must appear after a complete command separator (e.g. ;, &, or a newline), otherwise it will be parsed as part of the command.

It is considered an error if no commands are present.

Parses the body of any arbitrary arithmetic expression, e.g. x + $y << 5. The caller is responsible for parsing the external $(( )) tokens.

Trait Implementations

impl<I, B> IntoIterator for Parser<I, B> where
    I: Iterator<Item = Token>,
    B: Builder
[src]

Which kind of iterator are we turning this into?

The type of the elements being iterated over.

Creates an iterator from a value. Read more

impl<I: Debug, B: Debug> Debug for Parser<I, B>
[src]

Formats the value using the given formatter.