Struct Input

Source

pub struct Input<'a> { /* private fields */ }

Expand description

A struct representing the input code being lexed.

The Input struct provides methods to read, peek, consume, and skip characters from the bytes input code while keeping track of the current position (line, column, offset).

Implementations§

Source §

impl<'a> Input<'a>

Source

pub fn new(source: SourceIdentifier, bytes: &'a [u8]) -> Self

Creates a new Input instance from the given input.

§Arguments

input - A byte slice representing the input code to be processed.

§Returns

A new Input instance initialized at the beginning of the input.

Source

pub fn anchored_at(bytes: &'a [u8], anchor_position: Position) -> Self

Creates a new Input instance representing a byte slice that is “anchored” at a specific absolute position within a larger source file.

This is useful when lexing a subset (slice) of a source file, as it allows generated tokens to retain accurate absolute positions and spans relative to the original file.

The internal cursor (offset) starts at 0 relative to the bytes slice, but the absolute position is calculated relative to the anchor_position.

§Arguments

bytes - A byte slice representing the input code subset to be lexed.
anchor_position - The absolute Position in the original source file where the provided bytes slice begins.

§Returns

A new Input instance ready to lex the bytes, maintaining positions relative to anchor_position.

Source

pub const fn source_identifier(&self) -> SourceIdentifier

Returns the source identifier of the input code.

Source

pub const fn current_position(&self) -> Position

Returns the absolute current Position of the lexer within the original source file.

It calculates this by adding the internal offset (progress within the current byte slice) to the starting_position the Input was initialized with.

Source

pub const fn current_offset(&self) -> usize

Returns the current internal byte offset relative to the start of the input slice.

This indicates how many bytes have been consumed from the current bytes slice. To get the absolute position in the original source file, use current_position().

Source

pub const fn is_empty(&self) -> bool

Returns true if the input slice is empty (length is zero).

Source

pub const fn len(&self) -> usize

Returns the total length in bytes of the input slice being processed.

Source

pub const fn has_reached_eof(&self) -> bool

Checks if the current position is at the end of the input.

§Returns

true if the current offset is greater than or equal to the input length; false otherwise.

Source

pub fn next(&mut self)

Advances the current position by one character, updating line and column numbers.

Handles different line endings (\n, \r, \r\n) and updates line and column counters accordingly.

If the end of input is reached, no action is taken.

Source

pub fn skip(&mut self, count: usize)

Skips the next count characters, advancing the position accordingly.

Updates line and column numbers as it advances.

§Arguments

count - The number of characters to skip.

Source

pub fn consume(&mut self, count: usize) -> &'a [u8] ⓘ

Consumes the next count characters and returns them as a slice.

Advances the position by count characters.

§Arguments

count - The number of characters to consume.

§Returns

A byte slice containing the consumed characters.

Source

pub fn consume_remaining(&mut self) -> &'a [u8] ⓘ

Consumes all remaining characters from the current position to the end of input.

Advances the position to EOF.

§Returns

A byte slice containing the remaining characters.

Source

pub fn consume_until( &mut self, search: &[u8], ignore_ascii_case: bool, ) -> &'a [u8] ⓘ

Consumes characters until the given byte slice is found.

Advances the position to the start of the search slice if found, or to EOF if not found.

§Arguments

search - The byte slice to search for.
ignore_ascii_case - Whether to ignore ASCII case when comparing characters.

§Returns

A byte slice containing the consumed characters.

Source

pub fn consume_through(&mut self, search: u8) -> &'a [u8] ⓘ

Source

pub fn consume_whitespaces(&mut self) -> &'a [u8] ⓘ

Consumes whitespaces until a non-whitespace character is found.

§Returns

A byte slice containing the consumed whitespaces.

Source

pub fn read(&self, n: usize) -> &'a [u8] ⓘ

Reads the next n characters without advancing the position.

§Arguments

n - The number of characters to read.

§Returns

A byte slice containing the next n characters.

Source

pub fn read_at(&self, at: usize) -> &'a u8

Reads a single byte at a specific byte offset within the input slice, without advancing the internal cursor.

This provides direct, low-level access to the underlying byte data.

§Arguments

at - The zero-based byte offset within the input slice (self.bytes) from which to read the byte.

§Returns

A reference to the byte located at the specified offset at.

§Panics

This method panics if the provided at offset is out of bounds for the input byte slice (i.e., if at >= self.bytes.len()).

Source

pub fn is_at(&self, search: &[u8], ignore_ascii_case: bool) -> bool

Checks if the input at the current position matches the given byte slice.

§Arguments

search - The byte slice to compare against the input.
ignore_ascii_case - Whether to ignore ASCII case when comparing.

§Returns

true if the next bytes match search; false otherwise.

Source

pub const fn match_sequence_ignore_whitespace( &self, search: &[u8], ignore_ascii_case: bool, ) -> Option<usize>

Attempts to match the given byte sequence at the current position, ignoring whitespace in the input.

This method tries to match the provided byte slice search against the input starting from the current position, possibly ignoring ASCII case. Whitespace characters in the input are skipped during matching, but their length is included in the returned length.

Importantly, the method does not include any trailing whitespace after the matched sequence in the returned length.

For example, to match the sequence (string), the input could be (string), ( string ), ( string ), etc., and this method would return the total length of the input consumed to match (string), including any whitespace within the matched sequence, but excluding any whitespace after it.

§Arguments

search - The byte slice to match against the input.
ignore_ascii_case - If true, ASCII case is ignored during comparison.

§Returns

Some(length) - If the input matches search (ignoring whitespace within the sequence), returns the total length of the input consumed to match search, including any skipped whitespace within the matched sequence.
None - If the input does not match search.

§Examples

use mago_syntax_core::input::Input;
use mago_source::SourceIdentifier;

let source = SourceIdentifier::dummy();

// Given input "( string ) x", starting at offset 0:
let input = Input::new(source.clone(), b"( string ) x");
assert_eq!(input.match_sequence_ignore_whitespace(b"(string)", true), Some(10)); // 10 bytes consumed up to ')'

// Given input "(int)", with no whitespace:
let input = Input::new(source.clone(), b"(int)");
assert_eq!(input.match_sequence_ignore_whitespace(b"(int)", true), Some(5)); // 5 bytes consumed

// Given input "(  InT   )abc", ignoring ASCII case:
let input = Input::new(source.clone(), b"(  InT   )abc");
assert_eq!(input.match_sequence_ignore_whitespace(b"(int)", true), Some(10)); // 10 bytes consumed up to ')'

// Given input "(integer)", attempting to match "(int)":
let input = Input::new(source.clone(), b"(integer)");
assert_eq!(input.match_sequence_ignore_whitespace(b"(int)", false), None); // Does not match

// Trailing whitespace after ')':
let input = Input::new(source.clone(), b"(int)   x");
assert_eq!(input.match_sequence_ignore_whitespace(b"(int)", true), Some(5)); // Length up to ')', excludes spaces after ')'