pub struct Lexer<'a> { /* private fields */ }
Expand description
Lexical analyzer.
A lexer reads lines using an input function and parses the characters into tokens. It has an internal buffer containing the characters that have been read and the position (or the index) of the character that is to be parsed next.
Lexer
has primitive functions such as peek_char
that provide access
to the character at the current position. Derived functions such as
skip_blanks_and_comment
depend on those primitives to
parse more complex structures in the source code.
Implementations
Creates a new lexer that reads using the given input function.
Creates a new lexer with a fixed source code.
Disables line continuation recognition onward.
By default, peek_char
silently skips line
continuation sequences. When line continuation is disabled, however,
peek_char
returns characters literally.
Call enable_line_continuation
to
switch line continuation recognition on.
This function will panic if line continuation has already been disabled.
Re-enables line continuation.
You can pass the PlainLexer
returned from
disable_line_continuation
to this
function to re-enable line continuation. That is equivalent to dropping
the PlainLexer
instance, but the code will be more descriptive.
Peeks the next character.
If the end of input is reached, Ok(None)
is returned. On error,
Err(_)
is returned.
If line continuation recognition is enabled, combinations of a backslash
and a newline are silently skipped before returning the next character.
Call disable_line_continuation
to
switch off line continuation recognition.
Returns the location of the next character.
If there is no more character (that is, it is the end of input), an imaginary location is returned that would be returned if a character existed.
This function requires a mutable reference to self
since it may need to read a next
line if it is not yet read.
Consumes the next character.
This function must be called after peek_char
has successfully
returned the character. Consuming a character that has not yet been peeked would result
in a panic!
Returns the position of the next character, counted from zero.
futures_executor::block_on(async {
let mut lexer = Lexer::from_memory("abc", Source::Unknown);
assert_eq!(lexer.index(), 0);
let _ = lexer.peek_char().await;
assert_eq!(lexer.index(), 0);
lexer.consume_char();
assert_eq!(lexer.index(), 1);
})
Moves the current position back to the given index so that characters that have been consumed can be read again.
The given index must not be larger than the current index, or this function would panic.
futures_executor::block_on(async {
let mut lexer = Lexer::from_memory("abc", Source::Unknown);
let saved_index = lexer.index();
assert_eq!(lexer.peek_char().await, Ok(Some('a')));
lexer.consume_char();
assert_eq!(lexer.peek_char().await, Ok(Some('b')));
lexer.rewind(saved_index);
assert_eq!(lexer.peek_char().await, Ok(Some('a')));
})
Checks if there is any character that has been read from the input source but not yet consumed.
Clears the internal buffer of the lexer.
Locations returned from location
share a single code
instance that is also retained by the lexer. The code grows long as the
lexer reads more input. To prevent the code from getting too large, you
can call this function that replaces the retained code with a new empty
one. The new code’s start_line_number
will be incremented by the
number of lines in the previous.
Clears an end-of-input or error status so that the lexer can resume parsing.
This function will be useful only in an interactive shell where the user can continue entering commands even after (s)he sends an end-of-input or is interrupted by a syntax error.
pub async fn consume_char_if<F>(&mut self, f: F) -> Result<Option<&SourceChar>> where
F: FnMut(char) -> bool,
pub async fn consume_char_if<F>(&mut self, f: F) -> Result<Option<&SourceChar>> where
F: FnMut(char) -> bool,
Peeks the next character and, if the given decider function returns true for it, advances the position.
Returns the consumed character if the function returned true. Returns Ok(None)
if it
returned false or there is no more character.
pub fn source_string<I>(&self, i: I) -> String where
I: SliceIndex<[SourceChar], Output = [SourceChar]>,
pub fn source_string<I>(&self, i: I) -> String where
I: SliceIndex<[SourceChar], Output = [SourceChar]>,
Extracts a string from the source code.
This function returns the source code string for the range specified by the argument. The range must specify a valid index. If the index points to a character that have not yet read, this function will panic!.
Panics
If the argument index is out of bounds, i.e., pointing to an unread character.
Performs alias substitution right before the current position.
This function must be called just after a word has been parsed that
matches the name of the argument alias. No check is done in this function that there is
a matching word before the current position. The characters starting from the begin
index up to the current position are silently replaced with the alias value.
The resulting part of code will be characters with a Source::Alias
origin.
After the substitution, the position will be set before the replaced string.
Panics
If the replaced part is empty, i.e., begin >= self.index()
.
Tests if the given index is after the replacement string of alias substitution that ends with a blank.
Panics
If index
is larger than the currently read index.
Parses an optional compound list that is the content of a command substitution.
This function consumes characters until a token that cannot be the beginning of an and-or list is found and returns the string that was consumed.
Like Lexer::inner_program
, but returns the future in a pinned box.
Parses an arithmetic expansion.
The initial $
must have been consumed before calling this function.
In this function, the next two characters are examined to see if they
begin an arithmetic expansion. If the characters are ((
, then the
arithmetic expansion is parsed, in which case this function consumes up
to the closing ))
(inclusive). Otherwise, no characters are consumed
and the return value is Ok(Err(location))
.
The location
parameter should be the location of the initial $
. It
is used to construct the result, but this function does not check if it
actually is a location of $
.
Parses a command substitution of the form $(...)
.
The initial $
must have been consumed before calling this function.
In this function, the next character is examined to see if it begins a
command substitution. If it is (
, the following characters are parsed
as commands to find a matching )
, which will be consumed before this
function returns. Otherwise, no characters are consumed and the return
value is Ok(None)
.
opening_location
should be the location of the initial $
. It is used
to construct the result, but this function does not check if it actually
is a location of $
.
Reads a line literally.
This function recognizes no quotes or expansions. Starting from the current position, the line is read up to (but not including) the terminating newline.
Parses the content of a here-document.
Skips a character if the given function returns true for it.
Returns Ok(true)
if the character was skipped, Ok(false)
if the function returned
false, and Err(_)
if an error occurred, respectively.
skip_if
is a simpler version of consume_char_if
.
Skips blank characters until reaching a non-blank.
Skips a comment, if any.
A comment ends just before a newline. The newline is not part of the comment.
This function does not recognize line continuation inside the comment.
Skips blank characters and a comment, if any.
This function is the same as skip_blanks
followed by skip_comment
.
Parses a parameter expansion that is not enclosed in braces.
The initial $
must have been consumed before calling this function.
This functions checks if the next character is a valid POSIXly-portable
parameter name. If so, the name is consumed and returned. Otherwise, no
characters are consumed and the return value is Ok(Err(location))
.
The location
parameter should be the location of the initial $
. It
is used to construct the result, but this function does not check if it
actually is a location of $
.
Parses a text, i.e., a (possibly empty) sequence of TextUnit
s.
is_delimiter
tests if an unquoted character is a delimiter. When
is_delimiter
returns true, the parser stops parsing and returns the
text up to the delimiter.
is_escapable
tests if a backslash can escape a character. When the
parser founds an unquoted backslash, the next character is passed to
is_escapable
. If is_escapable
returns true, the backslash is treated
as a valid escape (TextUnit::Backslashed
). Otherwise, it ia a
literal (TextUnit::Literal
).
is_escapable
also affects escaping of double-quotes inside backquotes.
See text_unit
for details. Note that this
function calls text_unit
with WordContext::Text
.
Parses a text that may contain nested parentheses.
This function works similarly to text
. However, if an
unquoted (
is found in the text, all text units are parsed up to the
next matching unquoted )
. Inside the parentheses, the is_delimiter
function is ignored and all non-special characters are parsed as literal
word units. After finding the )
, this function continues parsing to
find a delimiter (as per is_delimiter
) or another parentheses.
Nested parentheses are supported: the number of (
s and )
s must
match. In other words, the final delimiter is recognized only outside
outermost parentheses.
Parses a token.
If there is no more token that can be parsed, the result is a token with an empty word and
EndOfInput
token identifier.