pub struct Lexer<'a> { /* private fields */ }
Expand description
Lexical analyzer
A lexer reads lines using an input function and parses the characters into tokens. It has an internal buffer containing the characters that have been read and the position (or the index) of the character that is to be parsed next.
Lexer
has primitive functions such as peek_char
that provide access
to the character at the current position. Derived functions such as
skip_blanks_and_comment
depend on those primitives to
parse more complex structures in the source code. Usually, the lexer is used by a
parser to read the source code and produce a syntax
tree, so you don’t need to call these functions directly.
To construct a lexer, you can use the Lexer::new
function with an input object.
You can also use the Lexer::config
function to create a configuration that allows you to
customize the settings before creating a lexer.
let mut config = Lexer::config();
config.start_line_number = 10.try_into().unwrap();
config.source = Some(Source::CommandString.into());
let mut lexer = config.input(Box::new(Memory::new("echo hello\n")));
let mut parser = Parser::new(&mut lexer);
_ = parser.command_line();
Implementations§
Source§impl<'a> Lexer<'a>
impl<'a> Lexer<'a>
Sourcepub fn config() -> Config
pub fn config() -> Config
Creates a new configuration with default settings.
This is a synonym for Config::new
. You can modify the settings and
then create a lexer with the input
method.
Sourcepub fn new(input: Box<dyn InputObject + 'a>) -> Lexer<'a>
pub fn new(input: Box<dyn InputObject + 'a>) -> Lexer<'a>
Creates a new lexer that reads using the given input function.
This is a convenience function that creates a lexer with the given input
object and the default configuration. To customize the configuration,
use the config
function.
This function is best used for testing or for simple cases where you
don’t need to customize the lexer. For practical use, it is recommended
to use the config
function to create a configuration
and provide it with supplementary information, especially
source
, before creating a lexer.
Sourcepub fn with_code(code: &'a str) -> Lexer<'a>
pub fn with_code(code: &'a str) -> Lexer<'a>
Creates a new lexer with a fixed source code.
This is a convenience function that creates a lexer that reads from a
string using Memory
with the default configuration.
This function is best used for testing or for simple cases where you
don’t need to customize the lexer. For practical use, it is recommended
to use the config
function to create a configuration
and provide it with supplementary information, especially
source
, before creating a lexer.
Sourcepub fn from_memory<S: Into<Rc<Source>>>(code: &'a str, source: S) -> Lexer<'a>
pub fn from_memory<S: Into<Rc<Source>>>(code: &'a str, source: S) -> Lexer<'a>
Creates a new lexer with a fixed source code.
This is a convenience function that creates a lexer that reads from a
string using Memory
with the specified source starting from line
number 1.
This function is soft-deprecated. Use with_code
instead if the source is Unknown
. Otherwise, use
config
to set the source and input
to create a lexer, which is more descriptive.
Sourcepub fn disable_line_continuation<'b>(&'b mut self) -> PlainLexer<'b, 'a>
pub fn disable_line_continuation<'b>(&'b mut self) -> PlainLexer<'b, 'a>
Disables line continuation recognition onward.
By default, peek_char
silently skips line
continuation sequences. When line continuation is disabled, however,
peek_char
returns characters literally.
Call enable_line_continuation
to
switch line continuation recognition on.
This function will panic if line continuation has already been disabled.
Sourcepub fn enable_line_continuation<'b>(_: PlainLexer<'a, 'b>)
pub fn enable_line_continuation<'b>(_: PlainLexer<'a, 'b>)
Re-enables line continuation.
You can pass the PlainLexer
returned from
disable_line_continuation
to this
function to re-enable line continuation. That is equivalent to dropping
the PlainLexer
instance, but the code will be more descriptive.
Sourcepub async fn peek_char(&mut self) -> Result<Option<char>>
pub async fn peek_char(&mut self) -> Result<Option<char>>
Peeks the next character.
If the end of input is reached, Ok(None)
is returned. On error,
Err(_)
is returned.
If line continuation recognition is enabled, combinations of a backslash
and a newline are silently skipped before returning the next character.
Call disable_line_continuation
to
switch off line continuation recognition.
This function requires a mutable reference to self
since it may need
to read the next line if needed.
Sourcepub async fn location(&mut self) -> Result<&Location>
pub async fn location(&mut self) -> Result<&Location>
Returns the location of the next character.
If there is no more character (that is, it is the end of input), an imaginary location is returned that would be returned if a character existed.
This function requires a mutable reference to self
since it needs to
peek the next character.
Sourcepub fn consume_char(&mut self)
pub fn consume_char(&mut self)
Consumes the next character.
This function must be called after peek_char
has successfully
returned the character. Consuming a character that has not yet been peeked would result
in a panic!
Sourcepub fn index(&self) -> usize
pub fn index(&self) -> usize
Returns the position of the next character, counted from zero.
let mut lexer = Lexer::with_code("abc");
assert_eq!(lexer.index(), 0);
let _ = lexer.peek_char().await;
assert_eq!(lexer.index(), 0);
lexer.consume_char();
assert_eq!(lexer.index(), 1);
Sourcepub fn rewind(&mut self, index: usize)
pub fn rewind(&mut self, index: usize)
Moves the current position back to the given index so that characters that have been consumed can be read again.
The given index must not be larger than the current index, or this function would panic.
let mut lexer = Lexer::with_code("abc");
let saved_index = lexer.index();
assert_eq!(lexer.peek_char().await, Ok(Some('a')));
lexer.consume_char();
assert_eq!(lexer.peek_char().await, Ok(Some('b')));
lexer.rewind(saved_index);
assert_eq!(lexer.peek_char().await, Ok(Some('a')));
Sourcepub fn pending(&self) -> bool
pub fn pending(&self) -> bool
Checks if there is any character that has been read from the input source but not yet consumed.
Sourcepub fn flush(&mut self)
pub fn flush(&mut self)
Clears the internal buffer of the lexer.
Locations returned from location
share a single code
instance that is also retained by the lexer. The code grows long as the
lexer reads more input. To prevent the code from getting too large, you
can call this function that replaces the retained code with a new empty
one. The new code’s start_line_number
will be incremented by the
number of lines in the previous.
Sourcepub fn reset(&mut self)
pub fn reset(&mut self)
Clears an end-of-input or error status so that the lexer can resume parsing.
This function will be useful only in an interactive shell where the user can continue entering commands even after (s)he sends an end-of-input or is interrupted by a syntax error.
Sourcepub async fn consume_char_if<F>(&mut self, f: F) -> Result<Option<&SourceChar>>
pub async fn consume_char_if<F>(&mut self, f: F) -> Result<Option<&SourceChar>>
Peeks the next character and, if the given decider function returns true for it, advances the position.
Returns the consumed character if the function returned true. Returns Ok(None)
if it
returned false or there is no more character.
Sourcepub fn source_string(&self, range: Range<usize>) -> String
pub fn source_string(&self, range: Range<usize>) -> String
Extracts a string from the source code range.
This function returns the source code string for the range specified by the argument. The range must specify a valid index. If the index points to a character that have not yet read, this function will panic!.
§Panics
If the argument index is out of bounds, i.e., pointing to an unread character.
Sourcepub fn location_range(&self, range: Range<usize>) -> Location
pub fn location_range(&self, range: Range<usize>) -> Location
Returns a location for a given range of the source code.
All the characters in the range must have been consumed. If the range refers to an unconsumed character, this function will panic!
If the characters are from more than one Code
fragment, the location
will only cover the initial portion of the range sharing the same
Code
.
§Panics
This function will panic if the range refers to an unconsumed character.
If the start index of the range is the end of input, it must have been peeked and the range must be empty, or the function will panic.
Sourcepub fn substitute_alias(&mut self, begin: usize, alias: &Rc<Alias>)
pub fn substitute_alias(&mut self, begin: usize, alias: &Rc<Alias>)
Performs alias substitution right before the current position.
This function must be called just after a word has been parsed that
matches the name of the argument alias. No check is done in this function that there is
a matching word before the current position. The characters starting from the begin
index up to the current position are silently replaced with the alias value.
The resulting part of code will be characters with a Source::Alias
origin.
After the substitution, the position will be set before the replaced string.
§Panics
If the replaced part is empty, i.e., begin >= self.index()
.
Sourcepub fn is_after_blank_ending_alias(&self, index: usize) -> bool
pub fn is_after_blank_ending_alias(&self, index: usize) -> bool
Tests if the given index is after the replacement string of alias substitution that ends with a blank.
§Panics
If index
is larger than the currently read index.
Sourcepub async fn inner_program(&mut self) -> Result<String>
pub async fn inner_program(&mut self) -> Result<String>
Parses an optional compound list that is the content of a command substitution.
This function consumes characters until a token that cannot be the beginning of an and-or list is found and returns the string that was consumed.
Sourcepub fn inner_program_boxed(
&mut self,
) -> Pin<Box<dyn Future<Output = Result<String>> + '_>>
pub fn inner_program_boxed( &mut self, ) -> Pin<Box<dyn Future<Output = Result<String>> + '_>>
Like Lexer::inner_program
, but returns the future in a pinning box.
Source§impl Lexer<'_>
impl Lexer<'_>
Sourcepub async fn arithmetic_expansion(
&mut self,
start_index: usize,
) -> Result<Option<TextUnit>>
pub async fn arithmetic_expansion( &mut self, start_index: usize, ) -> Result<Option<TextUnit>>
Parses an arithmetic expansion.
The initial $
must have been consumed before calling this function.
In this function, the next two characters are examined to see if they
begin an arithmetic expansion. If the characters are ((
, then the
arithmetic expansion is parsed, in which case this function consumes up
to the closing ))
(inclusive). Otherwise, no characters are consumed
and the return value is Ok(None)
.
The start_index
parameter should be the index for the initial $
. It is
used to construct the result, but this function does not check if it
actually points to the $
.
Source§impl Lexer<'_>
impl Lexer<'_>
Sourcepub async fn command_substitution(
&mut self,
start_index: usize,
) -> Result<Option<TextUnit>>
pub async fn command_substitution( &mut self, start_index: usize, ) -> Result<Option<TextUnit>>
Parses a command substitution of the form $(...)
.
The initial $
must have been consumed before calling this function.
In this function, the next character is examined to see if it begins a
command substitution. If it is (
, the following characters are parsed
as commands to find a matching )
, which will be consumed before this
function returns. Otherwise, no characters are consumed and the return
value is Ok(None)
.
The start_index
parameter should be the index for the initial $
. It is
used to construct the result, but this function does not check if it
actually points to the $
.
Source§impl Lexer<'_>
impl Lexer<'_>
Sourcepub async fn escape_unit(&mut self) -> Result<Option<EscapeUnit>>
pub async fn escape_unit(&mut self) -> Result<Option<EscapeUnit>>
Parses an escape unit.
This function tests if the next character is an escape sequence and
returns it if it is. If the next character is not an escape sequence, it
returns as EscapeUnit::Literal
. If there is no next character, it
returns Ok(None)
. It returns an error if an invalid escape sequence is
found.
This function should be called in a context where line continuations are disabled, so that backslash-newline pairs are not removed before they are parsed as escape sequences.
Sourcepub async fn escaped_string<F>(
&mut self,
is_delimiter: F,
) -> Result<EscapedString>
pub async fn escaped_string<F>( &mut self, is_delimiter: F, ) -> Result<EscapedString>
Parses an escaped string.
The is_delimiter
function is called with each character in the string
to determine if it is a delimiter. If is_delimiter
returns true
, the
character is not consumed and the function returns the string up to that
point. Otherwise, the character is consumed and the function continues.
The string may contain escape sequences as defined in EscapeUnit
.
Escaped strings typically appear as the content of
dollar-single-quotes, so is_delimiter
is usually |c| c == '\''
.
Source§impl Lexer<'_>
impl Lexer<'_>
Sourcepub async fn line(&mut self) -> Result<String>
pub async fn line(&mut self) -> Result<String>
Reads a line literally.
This function recognizes no quotes or expansions. Starting from the current position, the line is read up to (but not including) the terminating newline.
Sourcepub async fn here_doc_content(&mut self, here_doc: &HereDoc) -> Result<()>
pub async fn here_doc_content(&mut self, here_doc: &HereDoc) -> Result<()>
Parses the content of a here-document.
This function reads here-document content corresponding to the
here-document operator represented by the argument and fills
here_doc.content
with the results. The argument does not have to be
mutable because here_doc.content
is a RefCell
. Note that this
function will panic if here_doc.content
has been borrowed, and that
this function keeps a borrow from here_doc.content
until the returned
future resolves to the final result.
In case of an error, partial results may be left in here_doc.content
.
Source§impl Lexer<'_>
impl Lexer<'_>
Sourcepub async fn skip_if<F>(&mut self, f: F) -> Result<bool>
pub async fn skip_if<F>(&mut self, f: F) -> Result<bool>
Skips a character if the given function returns true for it.
Returns Ok(true)
if the character was skipped, Ok(false)
if the function returned
false, and Err(_)
if an error occurred, respectively.
skip_if
is a simpler version of consume_char_if
.
Sourcepub async fn skip_blanks(&mut self) -> Result<()>
pub async fn skip_blanks(&mut self) -> Result<()>
Skips blank characters until reaching a non-blank.
Sourcepub async fn skip_comment(&mut self) -> Result<()>
pub async fn skip_comment(&mut self) -> Result<()>
Skips a comment, if any.
A comment ends just before a newline. The newline is not part of the comment.
This function does not recognize line continuation inside the comment.
Sourcepub async fn skip_blanks_and_comment(&mut self) -> Result<()>
pub async fn skip_blanks_and_comment(&mut self) -> Result<()>
Skips blank characters and a comment, if any.
This function is the same as skip_blanks
followed by skip_comment
.
Source§impl Lexer<'_>
impl Lexer<'_>
Sourcepub async fn raw_param(
&mut self,
start_index: usize,
) -> Result<Option<TextUnit>>
pub async fn raw_param( &mut self, start_index: usize, ) -> Result<Option<TextUnit>>
Parses a parameter expansion that is not enclosed in braces.
The initial $
must have been consumed before calling this function.
This functions checks if the next character is a valid POSIXly-portable
parameter name. If so, the name is consumed and returned. Otherwise, no
characters are consumed and the return value is Ok(None)
.
The start_index
parameter should be the index for the initial $
. It is
used to construct the result, but this function does not check if it
actually points to the $
.
Source§impl Lexer<'_>
impl Lexer<'_>
Sourcepub async fn text<F, G>(
&mut self,
is_delimiter: F,
is_escapable: G,
) -> Result<Text>
pub async fn text<F, G>( &mut self, is_delimiter: F, is_escapable: G, ) -> Result<Text>
Parses a text, i.e., a (possibly empty) sequence of TextUnit
s.
is_delimiter
tests if an unquoted character is a delimiter. When
is_delimiter
returns true, the parser stops parsing and returns the
text up to the delimiter.
is_escapable
tests if a backslash can escape a character. When the
parser founds an unquoted backslash, the next character is passed to
is_escapable
. If is_escapable
returns true, the backslash is treated
as a valid escape (TextUnit::Backslashed
). Otherwise, it ia a
literal (TextUnit::Literal
).
is_escapable
also affects escaping of double-quotes inside backquotes.
See text_unit
for details. Note that this
function calls text_unit
with WordContext::Text
.
Sourcepub async fn text_with_parentheses<F, G>(
&mut self,
is_delimiter: F,
is_escapable: G,
) -> Result<Text>
pub async fn text_with_parentheses<F, G>( &mut self, is_delimiter: F, is_escapable: G, ) -> Result<Text>
Parses a text that may contain nested parentheses.
This function works similarly to text
. However, if an
unquoted (
is found in the text, all text units are parsed up to the
next matching unquoted )
. Inside the parentheses, the is_delimiter
function is ignored and all non-special characters are parsed as literal
word units. After finding the )
, this function continues parsing to
find a delimiter (as per is_delimiter
) or another parentheses.
Nested parentheses are supported: the number of (
s and )
s must
match. In other words, the final delimiter is recognized only outside
outermost parentheses.
Trait Implementations§
Auto Trait Implementations§
impl<'a> Freeze for Lexer<'a>
impl<'a> !RefUnwindSafe for Lexer<'a>
impl<'a> !Send for Lexer<'a>
impl<'a> !Sync for Lexer<'a>
impl<'a> Unpin for Lexer<'a>
impl<'a> !UnwindSafe for Lexer<'a>
Blanket Implementations§
Source§impl<T> BorrowMut<T> for Twhere
T: ?Sized,
impl<T> BorrowMut<T> for Twhere
T: ?Sized,
Source§fn borrow_mut(&mut self) -> &mut T
fn borrow_mut(&mut self) -> &mut T
Source§impl<T> IntoEither for T
impl<T> IntoEither for T
Source§fn into_either(self, into_left: bool) -> Either<Self, Self>
fn into_either(self, into_left: bool) -> Either<Self, Self>
self
into a Left
variant of Either<Self, Self>
if into_left
is true
.
Converts self
into a Right
variant of Either<Self, Self>
otherwise. Read moreSource§fn into_either_with<F>(self, into_left: F) -> Either<Self, Self>
fn into_either_with<F>(self, into_left: F) -> Either<Self, Self>
self
into a Left
variant of Either<Self, Self>
if into_left(&self)
returns true
.
Converts self
into a Right
variant of Either<Self, Self>
otherwise. Read more