Struct conch_parser::parse::Parser
[−]
[src]
pub struct Parser<I, B> { /* fields omitted */ }
A parser for the shell language. It will parse shell commands from a
stream of shell Token
s, and pass them to an AST builder.
The parser implements the IntoIterator
trait so that it can behave like
a stream of parsed shell commands. Converting the parser into an Iterator
and calling next()
on the result will yield a complete shell command, or
an error should one arise.
Building
To construct a parser you need a stream of Token
s and a Builder
which will receive data from the parser and assemble an AST. This
library provides both a default Token
lexer, as well as an AST Builder
.
use conch_parser::ast::builder::{Builder, RcBuilder}; use conch_parser::lexer::Lexer; use conch_parser::parse::Parser; let source = "echo hello world"; let lexer = Lexer::new(source.chars()); let mut parser = Parser::with_builder(lexer, RcBuilder::new()); assert!(parser.complete_command().unwrap().is_some());
If you want to use a parser with the default AST builder implementation
you can also use the DefaultParser
type alias for a simpler setup.
use conch_parser::lexer::Lexer; use conch_parser::parse::DefaultParser; let source = "echo hello world"; let lexer = Lexer::new(source.chars()); let mut parser = DefaultParser::new(lexer); assert!(parser.complete_command().unwrap().is_some());
Token lexing
Lexer implementations are free to yield tokens in whatever manner they wish, however, there are a few considerations the lexer should take.
First, the lexer should consolidate consecutive tokens such as Token::Name
,
Token::Literal
, and Token::Whitespace
as densely as possible, e.g.
Literal(foobar)
is preferred over [Literal(foo), Literal(bar)]
. Although
such splitting of tokens will not cause problems while parsing most shell
commands, certain situations require the parser to look-ahead some fixed
number of tokens so it can avoid backtracking. When the tokens are consolidated
the parser can look-ahead deterministically. If a lexer implementation chooses
not to use this strategy, the parser may unsuccessfully parse certain inputs
normally considered valid.
Second, the lexer can influence how token escaping is handled by the parser.
The backslash token, \
is used to escape, or make literal, any token which
may or may not have a special meaning. Since the parser operates on tokens and
not characters, the escaping of multi-character tokens is affected by how the
lexer yields them. For example, the source \<<
is normally considered by shells
as [Literal(<), Less]
. If this behavior is desired, the lexer should yield
the tokens [Backslash, Less, Less]
for that source. Otherwise if the lexer
yields the tokens [Backslash, DLess]
, the parser will treat the source as if
it were [Literal(<<)]
. The lexer's behavior need not be consistent between different
multi-char tokens, as long as it is aware of the implications.
Methods
impl<I: Iterator<Item = Token>, B: Builder + Default> Parser<I, B>
[src]
fn new<T>(iter: T) -> Parser<I, B> where
T: IntoIterator<Item = Token, IntoIter = I>,
T: IntoIterator<Item = Token, IntoIter = I>,
Creates a new Parser from a Token iterator or collection.
impl<I: Iterator<Item = Token>, B: Builder> Parser<I, B>
[src]
fn with_builder(iter: I, builder: B) -> Self
Creates a new Parser from a Token iterator and provided AST builder.
fn pos(&self) -> SourcePos
Returns the parser's current position in the source.
fn complete_command(&mut self) -> ParseResult<Option<B::Command>, B::Error>
Parses a single complete command.
For example, foo && bar; baz
will yield two complete
commands: And(foo, bar)
, and Simple(baz)
.
fn and_or_list(&mut self) -> ParseResult<B::CommandList, B::Error>
Parses compound AND/OR commands.
Commands are left associative. For example foo || bar && baz
parses to And(Or(foo, bar), baz)
.
fn pipeline(&mut self) -> ParseResult<B::ListableCommand, B::Error>
Parses either a single command or a pipeline of commands.
For example [!] foo | bar
.
fn command(&mut self) -> ParseResult<B::PipeableCommand, B::Error>
Parses any compound or individual command.
fn simple_command(&mut self) -> ParseResult<B::PipeableCommand, B::Error>
Tries to parse a simple command, e.g. cmd arg1 arg2 >redirect
.
A valid command is expected to have at least an executable name, or a single variable assignment or redirection. Otherwise an error will be returned.
fn redirect_list(&mut self) -> ParseResult<Vec<B::Redirect>, B::Error>
Parses a continuous list of redirections and will error if any words
that are not valid file descriptors are found. Essentially used for
parsing redirection lists after a compound command like while
or if
.
fn redirect(
&mut self
) -> ParseResult<Option<Result<B::Redirect, B::Word>>, B::Error>
&mut self
) -> ParseResult<Option<Result<B::Redirect, B::Word>>, B::Error>
Parses a redirection token an any source file descriptor and
path/destination descriptor as appropriate, e.g. >out
, 1>& 2
, or 2>&-
.
Since the source descriptor can be any arbitrarily complicated word, it makes it difficult to reliably peek forward whether a valid redirection exists without consuming anything. Thus this method may return a simple word if no redirection is found.
Thus, unless a parse error is occured, the return value will be an optional
redirect or word if either is found. In other words, Ok(Some(Ok(redirect)))
will result if a redirect is found, Ok(Some(Err(word)))
if a word is found,
or Ok(None)
if neither is found.
fn redirect_heredoc(
&mut self,
src_fd: Option<u16>
) -> ParseResult<B::Redirect, B::Error>
&mut self,
src_fd: Option<u16>
) -> ParseResult<B::Redirect, B::Error>
Parses a heredoc redirection and the heredoc's body.
This method will look ahead after the next unquoted/unescaped newline to capture the heredoc's body, and will stop consuming tokens until the approrpiate delimeter is found on a line by itself. If the delimeter is unquoted, the heredoc's body will be expanded for parameters and other special words. Otherwise, there heredoc's body will be treated as a literal.
The heredoc delimeter need not be a valid word (e.g. parameter subsitution
rules within ${ } need not apply), although it is expected to be balanced
like a regular word. In other words, all single/double quotes, backticks,
${ }
, $( )
, and ( )
must be balanced.
Note: if the delimeter is quoted, this method will look for an UNQUOTED
version in the body. For example <<"EOF"
will cause the parser to look
until \nEOF
is found. This it is possible to create a heredoc that can
only be delimited by the end of the stream, e.g. if a newline is embedded
within the delimeter. Any backticks that appear in the delimeter are
expected to appear at the end of the delimeter of the heredoc body, as
well as any embedded backslashes (unless the backslashes are followed by
a \, $, or `).
Note: this method expects that the caller provide a potential file descriptor for redirection.
fn word(&mut self) -> ParseResult<Option<B::Word>, B::Error>
Parses a whitespace delimited chunk of text, honoring space quoting rules, and skipping leading and trailing whitespace.
Since there are a large number of possible tokens that constitute a word, (such as literals, paramters, variables, etc.) the caller may not know for sure whether to expect a word, thus an optional result is returned in the event that no word exists.
Note that an error can still arise if partial tokens are present (e.g. malformed parameter).
fn word_preserve_trailing_whitespace(
&mut self
) -> ParseResult<Option<B::Word>, B::Error>
&mut self
) -> ParseResult<Option<B::Word>, B::Error>
Identical to Parser::word()
but preserves trailing whitespace after the word.
fn backticked_command_substitution(&mut self) -> ParseResult<B::Word, B::Error>
Parses a command subsitution in the form `cmd`.
Any backslashes that are immediately followed by \, $, or ` are removed before the contents inside the original backticks are recursively parsed as a command.
fn parameter(&mut self) -> ParseResult<B::Word, B::Error>
Parses a parameters such as $$
, $1
, $foo
, etc, or
parameter substitutions such as $(cmd)
, ${param-word}
, etc.
Since it is possible that a leading $
is not followed by a valid
parameter, the $
should be treated as a literal. Thus this method
returns an Word
, which will capture both cases where a literal or
parameter is parsed.
fn do_group(&mut self) -> ParseResult<CommandGroup<B::Command>, B::Error>
Parses any number of sequential commands between the do
and done
reserved words. Each of the reserved words must be a literal token, and cannot be
quoted or concatenated.
fn brace_group(&mut self) -> ParseResult<CommandGroup<B::Command>, B::Error>
Parses any number of sequential commands between balanced {
and }
reserved words. Each of the reserved words must be a literal token, and cannot be quoted.
fn subshell(&mut self) -> ParseResult<CommandGroup<B::Command>, B::Error>
Parses any number of sequential commands between balanced (
and )
.
It is considered an error if no commands are present inside the subshell.
fn compound_command(&mut self) -> ParseResult<B::CompoundCommand, B::Error>
Parses compound commands like for
, case
, if
, while
, until
,
brace groups, or subshells, including any redirection lists to be applied to them.
fn loop_command(
&mut self
) -> ParseResult<(LoopKind, GuardBodyPairGroup<B::Command>), B::Error>
&mut self
) -> ParseResult<(LoopKind, GuardBodyPairGroup<B::Command>), B::Error>
Parses loop commands like while
and until
but does not parse any
redirections that may follow.
Since they are compound commands (and can have redirections applied to the entire loop) this method returns the relevant parts of the loop command, without constructing an AST node, it so that the caller can do so with redirections.
fn if_command(&mut self) -> ParseResult<IfFragments<B::Command>, B::Error>
Parses a single if
command but does not parse any redirections that may follow.
Since if
is a compound command (and can have redirections applied to it) this
method returns the relevant parts of the if
command, without constructing an
AST node, it so that the caller can do so with redirections.
fn for_command(
&mut self
) -> ParseResult<ForFragments<B::Word, B::Command>, B::Error>
&mut self
) -> ParseResult<ForFragments<B::Word, B::Command>, B::Error>
Parses a single for
command but does not parse any redirections that may follow.
Since for
is a compound command (and can have redirections applied to it) this
method returns the relevant parts of the for
command, without constructing an
AST node, it so that the caller can do so with redirections.
fn case_command(
&mut self
) -> ParseResult<CaseFragments<B::Word, B::Command>, B::Error>
&mut self
) -> ParseResult<CaseFragments<B::Word, B::Command>, B::Error>
Parses a single case
command but does not parse any redirections that may follow.
Since case
is a compound command (and can have redirections applied to it) this
method returns the relevant parts of the case
command, without constructing an
AST node, it so that the caller can do so with redirections.
fn maybe_function_declaration(
&mut self
) -> ParseResult<Option<B::PipeableCommand>, B::Error>
&mut self
) -> ParseResult<Option<B::PipeableCommand>, B::Error>
Parses a single function declaration if present. If no function is present, nothing is consumed from the token stream.
fn function_declaration(&mut self) -> ParseResult<B::PipeableCommand, B::Error>
Parses a single function declaration.
A function declaration must either begin with the function
reserved word, or
the name of the function must be followed by ()
. Whitespace is allowed between
the name and (
, and whitespace is allowed between ()
.
fn skip_whitespace(&mut self)
Skips over any encountered whitespace but preserves newlines.
fn linebreak(&mut self) -> Vec<Newline>
Parses zero or more Token::Newline
s, skipping whitespace but capturing comments.
fn newline(&mut self) -> Option<Newline>
Tries to parse a Token::Newline
(or a comment) after skipping whitespace.
fn peek_reserved_token<'a>(&mut self, tokens: &'a [Token]) -> Option<&'a Token>
Checks that one of the specified tokens appears as a reserved word.
The token must be followed by a token which delimits a word when it is unquoted/unescaped.
If a reserved word is found, the token which it matches will be returned in case the caller cares which specific reserved word was found.
fn peek_reserved_word<'a>(&mut self, words: &'a [&str]) -> Option<&'a str>
Checks that one of the specified strings appears as a reserved word.
The word must appear as a single token, unquoted and unescaped, and
must be followed by a token which delimits a word when it is
unquoted/unescaped. The reserved word may appear as a Token::Name
or a Token::Literal
.
If a reserved word is found, the string which it matches will be returned in case the caller cares which specific reserved word was found.
fn reserved_token(&mut self, tokens: &[Token]) -> ParseResult<Token, B::Error>
Checks that one of the specified tokens appears as a reserved word and consumes it, returning the token it matched in case the caller cares which specific reserved word was found.
fn reserved_word<'a>(&mut self, words: &'a [&str]) -> Result<&'a str, ()>
Checks that one of the specified strings appears as a reserved word and consumes it, returning the string it matched in case the caller cares which specific reserved word was found.
fn command_group(
&mut self,
cfg: CommandGroupDelimiters
) -> ParseResult<CommandGroup<B::Command>, B::Error>
&mut self,
cfg: CommandGroupDelimiters
) -> ParseResult<CommandGroup<B::Command>, B::Error>
Parses commands until a configured delimeter (or EOF) is reached, without consuming the token or reserved word.
Any reserved word/token must appear after a complete command
separator (e.g. ;
, &
, or a newline), otherwise it will be
parsed as part of the command.
It is considered an error if no commands are present.
fn arithmetic_substitution(
&mut self
) -> ParseResult<DefaultArithmetic, B::Error>
&mut self
) -> ParseResult<DefaultArithmetic, B::Error>
Parses the body of any arbitrary arithmetic expression, e.g. x + $y << 5
.
The caller is responsible for parsing the external $(( ))
tokens.
Trait Implementations
impl<I, B> IntoIterator for Parser<I, B> where
I: Iterator<Item = Token>,
B: Builder,
[src]
I: Iterator<Item = Token>,
B: Builder,
type IntoIter = ParserIterator<I, B>
Which kind of iterator are we turning this into?
type Item = <Self::IntoIter as Iterator>::Item
The type of the elements being iterated over.
fn into_iter(self) -> Self::IntoIter
Creates an iterator from a value. Read more