Struct TokenStream

Source

pub struct TokenStream<'a> { /* private fields */ }

Expand description

Token types and token stream for lexer output. Token stream that wraps perl-lexer or a pre-lexed token buffer.

Provides three-token lookahead, transparent trivia skipping (in lexer mode), and statement-boundary state management used by the recursive-descent parser.

Implementations§

Source §

impl<'a> TokenStream<'a>

Source

pub fn new(input: &'a str) -> TokenStream<'a>

Create a new token stream from source code.

Source

pub fn from_vec(tokens: Vec<Token>) -> TokenStream<'a>

Create a token stream from a pre-lexed token list.

This constructor skips lexing entirely and feeds tokens directly from the provided Vec. It is intended for the incremental parsing pipeline where tokens from a prior parse run can be reused for unchanged regions.

§Behaviour differences from `TokenStream::new`

on_stmt_boundary: clears lookahead cache only; no lexer mode reset (tokens are already classified).
relex_as_term: clears lookahead cache only; no re-lexing (token kinds are fixed from the original lex pass).
enter_format_mode: no-op.

§Arguments

tokens — Pre-lexed tokens. An Eof token does not need to be included; the stream synthesises one when the buffer is exhausted.

§Examples

use perl_tokenizer::{Token, TokenKind, TokenStream};

let tokens = vec![
    Token::new(TokenKind::My, "my", 0, 2),
    Token::new(TokenKind::Eof, "", 2, 2),
];
let mut stream = TokenStream::from_vec(tokens);
assert!(matches!(stream.peek(), Ok(t) if t.kind == TokenKind::My));

Source

pub fn lexer_tokens_to_parser_tokens(tokens: Vec<Token>) -> Vec<Token>

Convert a slice of raw LexerTokens to parser Tokens, filtering out trivia.

This is a convenience method for the incremental parsing pipeline where the token cache stores raw lexer tokens (including whitespace and comments) and needs to convert them to parser tokens before feeding to Self::from_vec.

Trivia token types (whitespace, newlines, comments, EOF) are discarded. All other token types are converted using the same mapping as the live TokenStream would apply.

§Examples

use perl_tokenizer::{TokenKind, TokenStream};
use perl_lexer::{PerlLexer, TokenType};

// Collect raw lexer tokens
let mut lexer = PerlLexer::new("my $x = 1;");
let mut raw = Vec::new();
while let Some(t) = lexer.next_token() {
    if matches!(t.token_type, TokenType::EOF) { break; }
    raw.push(t);
}

// Convert to parser tokens and build a stream
let parser_tokens = TokenStream::lexer_tokens_to_parser_tokens(raw);
let mut stream = TokenStream::from_vec(parser_tokens);
assert!(matches!(stream.peek(), Ok(t) if t.kind == TokenKind::My));

Source

pub fn peek(&mut self) -> Result<&Token, ParseError>

Peek at the next token without consuming it

Source

pub fn next(&mut self) -> Result<Token, ParseError>

Consume and return the next token

Source

pub fn is_eof(&mut self) -> bool

Check if we’re at the end of input

Source

pub fn peek_second(&mut self) -> Result<&Token, ParseError>

Peek at the second token (two tokens ahead)

Source

pub fn peek_third(&mut self) -> Result<&Token, ParseError>

Peek at the third token (three tokens ahead)

Source

pub fn enter_format_mode(&mut self)

Enter format body parsing mode in the lexer.

No-op when operating in buffered (pre-lexed) mode — the tokens are already fully classified.

Source

pub fn on_stmt_boundary(&mut self)

Called at statement boundaries to reset lexer state and clear cached lookahead.

In buffered mode only the lookahead cache is cleared; no lexer mode reset is performed because the tokens are already fully classified.

Source

pub fn relex_as_term(&mut self)

Re-lex the current peeked token in ExpectTerm mode.

This is needed for context-sensitive constructs like split /regex/ where the / was lexed as division (Slash) but should be a regex delimiter. Rolls the lexer back to the peeked token’s start position, switches to ExpectTerm mode, and clears the peek cache so the next peek() or next() re-lexes it as a regex.

In buffered mode the peek cache is cleared but no re-lexing occurs — token kinds are fixed from the original lex pass.

Source