pub struct Input<'a> { /* private fields */ }Expand description
A struct representing the input code being lexed.
The Input struct provides methods to read, peek, consume, and skip characters
from the bytes input code while keeping track of the current position (line, column, offset).
Implementations§
Source§impl<'a> Input<'a>
impl<'a> Input<'a>
Sourcepub fn new(source: SourceIdentifier, bytes: &'a [u8]) -> Self
pub fn new(source: SourceIdentifier, bytes: &'a [u8]) -> Self
Sourcepub fn anchored_at(bytes: &'a [u8], anchor_position: Position) -> Self
pub fn anchored_at(bytes: &'a [u8], anchor_position: Position) -> Self
Creates a new Input instance representing a byte slice that is
“anchored” at a specific absolute position within a larger source file.
This is useful when lexing a subset (slice) of a source file, as it allows generated tokens to retain accurate absolute positions and spans relative to the original file.
The internal cursor (offset) starts at 0 relative to the bytes slice,
but the absolute position is calculated relative to the anchor_position.
§Arguments
bytes- A byte slice representing the input code subset to be lexed.anchor_position- The absolutePositionin the original source file where the providedbytesslice begins.
§Returns
A new Input instance ready to lex the bytes, maintaining positions
relative to anchor_position.
Sourcepub const fn source_identifier(&self) -> SourceIdentifier
pub const fn source_identifier(&self) -> SourceIdentifier
Returns the source identifier of the input code.
Sourcepub const fn current_position(&self) -> Position
pub const fn current_position(&self) -> Position
Returns the absolute current Position of the lexer within the original source file.
It calculates this by adding the internal offset (progress within the current byte slice)
to the starting_position the Input was initialized with.
Sourcepub const fn current_offset(&self) -> usize
pub const fn current_offset(&self) -> usize
Returns the current internal byte offset relative to the start of the input slice.
This indicates how many bytes have been consumed from the current bytes slice.
To get the absolute position in the original source file, use current_position().
Sourcepub const fn is_empty(&self) -> bool
pub const fn is_empty(&self) -> bool
Returns true if the input slice is empty (length is zero).
Sourcepub const fn len(&self) -> usize
pub const fn len(&self) -> usize
Returns the total length in bytes of the input slice being processed.
Sourcepub const fn has_reached_eof(&self) -> bool
pub const fn has_reached_eof(&self) -> bool
Checks if the current position is at the end of the input.
§Returns
true if the current offset is greater than or equal to the input length; false otherwise.
Sourcepub fn next(&mut self)
pub fn next(&mut self)
Advances the current position by one character, updating line and column numbers.
Handles different line endings (\n, \r, \r\n) and updates line and column counters accordingly.
If the end of input is reached, no action is taken.
Sourcepub fn skip(&mut self, count: usize)
pub fn skip(&mut self, count: usize)
Skips the next count characters, advancing the position accordingly.
Updates line and column numbers as it advances.
§Arguments
count- The number of characters to skip.
Sourcepub fn consume_remaining(&mut self) -> &'a [u8] ⓘ
pub fn consume_remaining(&mut self) -> &'a [u8] ⓘ
Consumes all remaining characters from the current position to the end of input.
Advances the position to EOF.
§Returns
A byte slice containing the remaining characters.
Sourcepub fn consume_until(
&mut self,
search: &[u8],
ignore_ascii_case: bool,
) -> &'a [u8] ⓘ
pub fn consume_until( &mut self, search: &[u8], ignore_ascii_case: bool, ) -> &'a [u8] ⓘ
Consumes characters until the given byte slice is found.
Advances the position to the start of the search slice if found, or to EOF if not found.
§Arguments
search- The byte slice to search for.ignore_ascii_case- Whether to ignore ASCII case when comparing characters.
§Returns
A byte slice containing the consumed characters.
pub fn consume_through(&mut self, search: u8) -> &'a [u8] ⓘ
Sourcepub fn consume_whitespaces(&mut self) -> &'a [u8] ⓘ
pub fn consume_whitespaces(&mut self) -> &'a [u8] ⓘ
Consumes whitespaces until a non-whitespace character is found.
§Returns
A byte slice containing the consumed whitespaces.
Sourcepub fn read_at(&self, at: usize) -> &'a u8
pub fn read_at(&self, at: usize) -> &'a u8
Reads a single byte at a specific byte offset within the input slice, without advancing the internal cursor.
This provides direct, low-level access to the underlying byte data.
§Arguments
at- The zero-based byte offset within the input slice (self.bytes) from which to read the byte.
§Returns
A reference to the byte located at the specified offset at.
§Panics
This method panics if the provided at offset is out of bounds
for the input byte slice (i.e., if at >= self.bytes.len()).
Sourcepub const fn match_sequence_ignore_whitespace(
&self,
search: &[u8],
ignore_ascii_case: bool,
) -> Option<usize>
pub const fn match_sequence_ignore_whitespace( &self, search: &[u8], ignore_ascii_case: bool, ) -> Option<usize>
Attempts to match the given byte sequence at the current position, ignoring whitespace in the input.
This method tries to match the provided byte slice search against the input starting
from the current position, possibly ignoring ASCII case. Whitespace characters in the input
are skipped during matching, but their length is included in the returned length.
Importantly, the method does not include any trailing whitespace after the matched sequence in the returned length.
For example, to match the sequence (string), the input could be (string), ( string ), ( string ), etc.,
and this method would return the total length of the input consumed to match (string),
including any whitespace within the matched sequence, but excluding any whitespace after it.
§Arguments
search- The byte slice to match against the input.ignore_ascii_case- Iftrue, ASCII case is ignored during comparison.
§Returns
Some(length)- If the input matchessearch(ignoring whitespace within the sequence), returns the total length of the input consumed to matchsearch, including any skipped whitespace within the matched sequence.None- If the input does not matchsearch.
§Examples
use mago_syntax_core::input::Input;
use mago_source::SourceIdentifier;
let source = SourceIdentifier::dummy();
// Given input "( string ) x", starting at offset 0:
let input = Input::new(source.clone(), b"( string ) x");
assert_eq!(input.match_sequence_ignore_whitespace(b"(string)", true), Some(10)); // 10 bytes consumed up to ')'
// Given input "(int)", with no whitespace:
let input = Input::new(source.clone(), b"(int)");
assert_eq!(input.match_sequence_ignore_whitespace(b"(int)", true), Some(5)); // 5 bytes consumed
// Given input "( InT )abc", ignoring ASCII case:
let input = Input::new(source.clone(), b"( InT )abc");
assert_eq!(input.match_sequence_ignore_whitespace(b"(int)", true), Some(10)); // 10 bytes consumed up to ')'
// Given input "(integer)", attempting to match "(int)":
let input = Input::new(source.clone(), b"(integer)");
assert_eq!(input.match_sequence_ignore_whitespace(b"(int)", false), None); // Does not match
// Trailing whitespace after ')':
let input = Input::new(source.clone(), b"(int) x");
assert_eq!(input.match_sequence_ignore_whitespace(b"(int)", true), Some(5)); // Length up to ')', excludes spaces after ')'