Struct Lexer

Source
pub struct Lexer<'a> { /* private fields */ }
Expand description

Lexical analyzer

A lexer reads lines using an input function and parses the characters into tokens. It has an internal buffer containing the characters that have been read and the position (or the index) of the character that is to be parsed next.

Lexer has primitive functions such as peek_char that provide access to the character at the current position. Derived functions such as skip_blanks_and_comment depend on those primitives to parse more complex structures in the source code. Usually, the lexer is used by a parser to read the source code and produce a syntax tree, so you don’t need to call these functions directly.

To construct a lexer, you can use the Lexer::new function with an input object. You can also use the Lexer::config function to create a configuration that allows you to customize the settings before creating a lexer.

let mut config = Lexer::config();
config.start_line_number = 10.try_into().unwrap();
config.source = Some(Source::CommandString.into());
let mut lexer = config.input(Box::new(Memory::new("echo hello\n")));
let mut parser = Parser::new(&mut lexer);
_ = parser.command_line();

Implementations§

Source§

impl<'a> Lexer<'a>

Source

pub fn config() -> Config

Creates a new configuration with default settings.

This is a synonym for Config::new. You can modify the settings and then create a lexer with the input method.

Source

pub fn new(input: Box<dyn InputObject + 'a>) -> Lexer<'a>

Creates a new lexer that reads using the given input function.

This is a convenience function that creates a lexer with the given input object and the default configuration. To customize the configuration, use the config function.

This function is best used for testing or for simple cases where you don’t need to customize the lexer. For practical use, it is recommended to use the config function to create a configuration and provide it with supplementary information, especially source, before creating a lexer.

Source

pub fn with_code(code: &'a str) -> Lexer<'a>

Creates a new lexer with a fixed source code.

This is a convenience function that creates a lexer that reads from a string using Memory with the default configuration.

This function is best used for testing or for simple cases where you don’t need to customize the lexer. For practical use, it is recommended to use the config function to create a configuration and provide it with supplementary information, especially source, before creating a lexer.

Source

pub fn from_memory<S: Into<Rc<Source>>>(code: &'a str, source: S) -> Lexer<'a>

Creates a new lexer with a fixed source code.

This is a convenience function that creates a lexer that reads from a string using Memory with the specified source starting from line number 1.

This function is soft-deprecated. Use with_code instead if the source is Unknown. Otherwise, use config to set the source and input to create a lexer, which is more descriptive.

Source

pub fn disable_line_continuation<'b>(&'b mut self) -> PlainLexer<'b, 'a>

Disables line continuation recognition onward.

By default, peek_char silently skips line continuation sequences. When line continuation is disabled, however, peek_char returns characters literally.

Call enable_line_continuation to switch line continuation recognition on.

This function will panic if line continuation has already been disabled.

Source

pub fn enable_line_continuation<'b>(_: PlainLexer<'a, 'b>)

Re-enables line continuation.

You can pass the PlainLexer returned from disable_line_continuation to this function to re-enable line continuation. That is equivalent to dropping the PlainLexer instance, but the code will be more descriptive.

Source

pub async fn peek_char(&mut self) -> Result<Option<char>>

Peeks the next character.

If the end of input is reached, Ok(None) is returned. On error, Err(_) is returned.

If line continuation recognition is enabled, combinations of a backslash and a newline are silently skipped before returning the next character. Call disable_line_continuation to switch off line continuation recognition.

This function requires a mutable reference to self since it may need to read the next line if needed.

Source

pub async fn location(&mut self) -> Result<&Location>

Returns the location of the next character.

If there is no more character (that is, it is the end of input), an imaginary location is returned that would be returned if a character existed.

This function requires a mutable reference to self since it needs to peek the next character.

Source

pub fn consume_char(&mut self)

Consumes the next character.

This function must be called after peek_char has successfully returned the character. Consuming a character that has not yet been peeked would result in a panic!

Source

pub fn index(&self) -> usize

Returns the position of the next character, counted from zero.

let mut lexer = Lexer::with_code("abc");
assert_eq!(lexer.index(), 0);
let _ = lexer.peek_char().await;
assert_eq!(lexer.index(), 0);
lexer.consume_char();
assert_eq!(lexer.index(), 1);
Source

pub fn rewind(&mut self, index: usize)

Moves the current position back to the given index so that characters that have been consumed can be read again.

The given index must not be larger than the current index, or this function would panic.

let mut lexer = Lexer::with_code("abc");
let saved_index = lexer.index();
assert_eq!(lexer.peek_char().await, Ok(Some('a')));
lexer.consume_char();
assert_eq!(lexer.peek_char().await, Ok(Some('b')));
lexer.rewind(saved_index);
assert_eq!(lexer.peek_char().await, Ok(Some('a')));
Source

pub fn pending(&self) -> bool

Checks if there is any character that has been read from the input source but not yet consumed.

Source

pub fn flush(&mut self)

Clears the internal buffer of the lexer.

Locations returned from location share a single code instance that is also retained by the lexer. The code grows long as the lexer reads more input. To prevent the code from getting too large, you can call this function that replaces the retained code with a new empty one. The new code’s start_line_number will be incremented by the number of lines in the previous.

Source

pub fn reset(&mut self)

Clears an end-of-input or error status so that the lexer can resume parsing.

This function will be useful only in an interactive shell where the user can continue entering commands even after (s)he sends an end-of-input or is interrupted by a syntax error.

Source

pub async fn consume_char_if<F>(&mut self, f: F) -> Result<Option<&SourceChar>>
where F: FnMut(char) -> bool,

Peeks the next character and, if the given decider function returns true for it, advances the position.

Returns the consumed character if the function returned true. Returns Ok(None) if it returned false or there is no more character.

Source

pub fn source_string(&self, range: Range<usize>) -> String

Extracts a string from the source code range.

This function returns the source code string for the range specified by the argument. The range must specify a valid index. If the index points to a character that have not yet read, this function will panic!.

§Panics

If the argument index is out of bounds, i.e., pointing to an unread character.

Source

pub fn location_range(&self, range: Range<usize>) -> Location

Returns a location for a given range of the source code.

All the characters in the range must have been consumed. If the range refers to an unconsumed character, this function will panic!

If the characters are from more than one Code fragment, the location will only cover the initial portion of the range sharing the same Code.

§Panics

This function will panic if the range refers to an unconsumed character.

If the start index of the range is the end of input, it must have been peeked and the range must be empty, or the function will panic.

Source

pub fn substitute_alias(&mut self, begin: usize, alias: &Rc<Alias>)

Performs alias substitution right before the current position.

This function must be called just after a word has been parsed that matches the name of the argument alias. No check is done in this function that there is a matching word before the current position. The characters starting from the begin index up to the current position are silently replaced with the alias value.

The resulting part of code will be characters with a Source::Alias origin.

After the substitution, the position will be set before the replaced string.

§Panics

If the replaced part is empty, i.e., begin >= self.index().

Source

pub fn is_after_blank_ending_alias(&self, index: usize) -> bool

Tests if the given index is after the replacement string of alias substitution that ends with a blank.

§Panics

If index is larger than the currently read index.

Source

pub async fn inner_program(&mut self) -> Result<String>

Parses an optional compound list that is the content of a command substitution.

This function consumes characters until a token that cannot be the beginning of an and-or list is found and returns the string that was consumed.

Source

pub fn inner_program_boxed( &mut self, ) -> Pin<Box<dyn Future<Output = Result<String>> + '_>>

Like Lexer::inner_program, but returns the future in a pinning box.

Source§

impl Lexer<'_>

Source

pub async fn arithmetic_expansion( &mut self, start_index: usize, ) -> Result<Option<TextUnit>>

Parses an arithmetic expansion.

The initial $ must have been consumed before calling this function. In this function, the next two characters are examined to see if they begin an arithmetic expansion. If the characters are ((, then the arithmetic expansion is parsed, in which case this function consumes up to the closing )) (inclusive). Otherwise, no characters are consumed and the return value is Ok(None).

The start_index parameter should be the index for the initial $. It is used to construct the result, but this function does not check if it actually points to the $.

Source§

impl Lexer<'_>

Source

pub async fn command_substitution( &mut self, start_index: usize, ) -> Result<Option<TextUnit>>

Parses a command substitution of the form $(...).

The initial $ must have been consumed before calling this function. In this function, the next character is examined to see if it begins a command substitution. If it is (, the following characters are parsed as commands to find a matching ), which will be consumed before this function returns. Otherwise, no characters are consumed and the return value is Ok(None).

The start_index parameter should be the index for the initial $. It is used to construct the result, but this function does not check if it actually points to the $.

Source§

impl Lexer<'_>

Source

pub async fn escape_unit(&mut self) -> Result<Option<EscapeUnit>>

Parses an escape unit.

This function tests if the next character is an escape sequence and returns it if it is. If the next character is not an escape sequence, it returns as EscapeUnit::Literal. If there is no next character, it returns Ok(None). It returns an error if an invalid escape sequence is found.

This function should be called in a context where line continuations are disabled, so that backslash-newline pairs are not removed before they are parsed as escape sequences.

Source

pub async fn escaped_string<F>( &mut self, is_delimiter: F, ) -> Result<EscapedString>
where F: FnMut(char) -> bool,

Parses an escaped string.

The is_delimiter function is called with each character in the string to determine if it is a delimiter. If is_delimiter returns true, the character is not consumed and the function returns the string up to that point. Otherwise, the character is consumed and the function continues.

The string may contain escape sequences as defined in EscapeUnit.

Escaped strings typically appear as the content of dollar-single-quotes, so is_delimiter is usually |c| c == '\''.

Source§

impl Lexer<'_>

Source

pub async fn line(&mut self) -> Result<String>

Reads a line literally.

This function recognizes no quotes or expansions. Starting from the current position, the line is read up to (but not including) the terminating newline.

Source

pub async fn here_doc_content(&mut self, here_doc: &HereDoc) -> Result<()>

Parses the content of a here-document.

This function reads here-document content corresponding to the here-document operator represented by the argument and fills here_doc.content with the results. The argument does not have to be mutable because here_doc.content is a RefCell. Note that this function will panic if here_doc.content has been borrowed, and that this function keeps a borrow from here_doc.content until the returned future resolves to the final result.

In case of an error, partial results may be left in here_doc.content.

Source§

impl Lexer<'_>

Source

pub async fn skip_if<F>(&mut self, f: F) -> Result<bool>
where F: FnMut(char) -> bool,

Skips a character if the given function returns true for it.

Returns Ok(true) if the character was skipped, Ok(false) if the function returned false, and Err(_) if an error occurred, respectively.

skip_if is a simpler version of consume_char_if.

Source

pub async fn skip_blanks(&mut self) -> Result<()>

Skips blank characters until reaching a non-blank.

Source

pub async fn skip_comment(&mut self) -> Result<()>

Skips a comment, if any.

A comment ends just before a newline. The newline is not part of the comment.

This function does not recognize line continuation inside the comment.

Source

pub async fn skip_blanks_and_comment(&mut self) -> Result<()>

Skips blank characters and a comment, if any.

This function is the same as skip_blanks followed by skip_comment.

Source§

impl Lexer<'_>

Source

pub async fn operator(&mut self) -> Result<Option<Token>>

Parses an operator token.

Source§

impl Lexer<'_>

Source

pub async fn raw_param( &mut self, start_index: usize, ) -> Result<Option<TextUnit>>

Parses a parameter expansion that is not enclosed in braces.

The initial $ must have been consumed before calling this function. This functions checks if the next character is a valid POSIXly-portable parameter name. If so, the name is consumed and returned. Otherwise, no characters are consumed and the return value is Ok(None).

The start_index parameter should be the index for the initial $. It is used to construct the result, but this function does not check if it actually points to the $.

Source§

impl Lexer<'_>

Source

pub async fn text<F, G>( &mut self, is_delimiter: F, is_escapable: G, ) -> Result<Text>
where F: FnMut(char) -> bool, G: FnMut(char) -> bool,

Parses a text, i.e., a (possibly empty) sequence of TextUnits.

is_delimiter tests if an unquoted character is a delimiter. When is_delimiter returns true, the parser stops parsing and returns the text up to the delimiter.

is_escapable tests if a backslash can escape a character. When the parser founds an unquoted backslash, the next character is passed to is_escapable. If is_escapable returns true, the backslash is treated as a valid escape (TextUnit::Backslashed). Otherwise, it ia a literal (TextUnit::Literal).

is_escapable also affects escaping of double-quotes inside backquotes. See text_unit for details. Note that this function calls text_unit with WordContext::Text.

Source

pub async fn text_with_parentheses<F, G>( &mut self, is_delimiter: F, is_escapable: G, ) -> Result<Text>
where F: FnMut(char) -> bool, G: FnMut(char) -> bool,

Parses a text that may contain nested parentheses.

This function works similarly to text. However, if an unquoted ( is found in the text, all text units are parsed up to the next matching unquoted ). Inside the parentheses, the is_delimiter function is ignored and all non-special characters are parsed as literal word units. After finding the ), this function continues parsing to find a delimiter (as per is_delimiter) or another parentheses.

Nested parentheses are supported: the number of (s and )s must match. In other words, the final delimiter is recognized only outside outermost parentheses.

Source§

impl Lexer<'_>

Source

pub async fn token(&mut self) -> Result<Token>

Parses a token.

If there is no more token that can be parsed, the result is a token with an empty word and EndOfInput token identifier.

Trait Implementations§

Source§

impl<'a> Debug for Lexer<'a>

Source§

fn fmt(&self, f: &mut Formatter<'_>) -> Result

Formats the value using the given formatter. Read more

Auto Trait Implementations§

§

impl<'a> Freeze for Lexer<'a>

§

impl<'a> !RefUnwindSafe for Lexer<'a>

§

impl<'a> !Send for Lexer<'a>

§

impl<'a> !Sync for Lexer<'a>

§

impl<'a> Unpin for Lexer<'a>

§

impl<'a> !UnwindSafe for Lexer<'a>

Blanket Implementations§

Source§

impl<T> Any for T
where T: 'static + ?Sized,

Source§

fn type_id(&self) -> TypeId

Gets the TypeId of self. Read more
Source§

impl<T> Borrow<T> for T
where T: ?Sized,

Source§

fn borrow(&self) -> &T

Immutably borrows from an owned value. Read more
Source§

impl<T> BorrowMut<T> for T
where T: ?Sized,

Source§

fn borrow_mut(&mut self) -> &mut T

Mutably borrows from an owned value. Read more
Source§

impl<T> From<T> for T

Source§

fn from(t: T) -> T

Returns the argument unchanged.

Source§

impl<T, U> Into<U> for T
where U: From<T>,

Source§

fn into(self) -> U

Calls U::from(self).

That is, this conversion is whatever the implementation of From<T> for U chooses to do.

Source§

impl<T> IntoEither for T

Source§

fn into_either(self, into_left: bool) -> Either<Self, Self>

Converts self into a Left variant of Either<Self, Self> if into_left is true. Converts self into a Right variant of Either<Self, Self> otherwise. Read more
Source§

fn into_either_with<F>(self, into_left: F) -> Either<Self, Self>
where F: FnOnce(&Self) -> bool,

Converts self into a Left variant of Either<Self, Self> if into_left(&self) returns true. Converts self into a Right variant of Either<Self, Self> otherwise. Read more
Source§

impl<T, U> TryFrom<U> for T
where U: Into<T>,

Source§

type Error = Infallible

The type returned in the event of a conversion error.
Source§

fn try_from(value: U) -> Result<T, <T as TryFrom<U>>::Error>

Performs the conversion.
Source§

impl<T, U> TryInto<U> for T
where U: TryFrom<T>,

Source§

type Error = <U as TryFrom<T>>::Error

The type returned in the event of a conversion error.
Source§

fn try_into(self) -> Result<U, <U as TryFrom<T>>::Error>

Performs the conversion.