Skip to main content

RegexLexer

Struct RegexLexer 

Source
pub struct RegexLexer<'config> { /* private fields */ }
Expand description

Lexer for regular expressions.

RegexLexer is responsible for tokenizing regular expression source code into a series of tokens that can be used by the parser. It handles all regex syntax including character classes, quantifiers, groups, assertions, and special characters.

§Examples

Basic usage:

use oak_core::{Lexer, LexerCache, LexerState, ParseSession, SourceText};
use oak_regex::{RegexLanguage, RegexLexer};

let language = RegexLanguage::default();
let lexer = RegexLexer::new(&language);
let source = SourceText::new(r"[a-z]+\d{1,3}");
let mut cache = ParseSession::<RegexLanguage>::default();
let output = lexer.lex(&source, &[], &mut cache);

// Output contains tokens for the entire source
assert!(!output.result.unwrap().is_empty());

Tokenizing different regex constructs:

use oak_core::{Lexer, LexerCache, LexerState, ParseSession, SourceText};
use oak_regex::{RegexLanguage, RegexLexer};

let language = RegexLanguage::default();
let lexer = RegexLexer::new(&language);

// Tokenize a complex regular expression
let source = SourceText::new(r"(?:(?:[a-zA-Z0-9-]+\.)+[a-zA-Z]{2,})");
let mut cache = ParseSession::<RegexLanguage>::default();
let output = lexer.lex(&source, &[], &mut cache);

// Verify tokens were generated
assert!(output.result.unwrap().len() > 5);

Implementations§

Source§

impl<'config> RegexLexer<'config>

Source

pub fn new(config: &'config RegexLanguage) -> Self

Creates a new RegexLexer with the given language configuration.

§Arguments
  • config - A RegexLanguage configuration that controls language-specific parsing behavior.
§Examples

let language = RegexLanguage::default();
let lexer = RegexLexer::new(&language);
Source

pub fn whitespace_rules(&self) -> &WhitespaceConfig

Returns the whitespace configuration for the lexer.

This method defines how the lexer should handle whitespace characters. The configuration enables Unicode whitespace support, allowing the lexer to recognize all Unicode whitespace characters, not just ASCII spaces.

Source

pub fn comment_rules(&self) -> CommentConfig

Returns the comment configuration for the lexer.

This method defines how the lexer should handle comments in regular expressions. Regular expressions typically use # as a line comment marker, with comments continuing to the end of the line.

Source

pub fn string_rules(&self) -> StringConfig

Returns the string literal configuration for the lexer.

This method defines how the lexer should handle string literals in regular expressions. Regex strings are typically enclosed in double quotes (“) and use backslash () as escape character.

Source

pub fn char_rules(&self) -> StringConfig

Returns the character literal configuration for the lexer.

This method defines how the lexer should handle character literals in regular expressions. Regex character literals are enclosed in single quotes (’) and do not use escape characters in the same way as strings.

Trait Implementations§

Source§

impl<'config> Clone for RegexLexer<'config>

Source§

fn clone(&self) -> RegexLexer<'config>

Returns a duplicate of the value. Read more
1.0.0 · Source§

fn clone_from(&mut self, source: &Self)

Performs copy-assignment from source. Read more
Source§

impl<'config> Debug for RegexLexer<'config>

Source§

fn fmt(&self, f: &mut Formatter<'_>) -> Result

Formats the value using the given formatter. Read more
Source§

impl<'config> Lexer<RegexLanguage> for RegexLexer<'config>

Source§

fn lex<'a, S: Source + ?Sized>( &self, source: &'a S, _edits: &[TextEdit], cache: &'a mut impl LexerCache<RegexLanguage>, ) -> LexOutput<RegexLanguage>

Tokenizes the given source text into a sequence of tokens. Read more

Auto Trait Implementations§

§

impl<'config> Freeze for RegexLexer<'config>

§

impl<'config> RefUnwindSafe for RegexLexer<'config>

§

impl<'config> Send for RegexLexer<'config>

§

impl<'config> Sync for RegexLexer<'config>

§

impl<'config> Unpin for RegexLexer<'config>

§

impl<'config> UnwindSafe for RegexLexer<'config>

Blanket Implementations§

Source§

impl<T> Any for T
where T: 'static + ?Sized,

Source§

fn type_id(&self) -> TypeId

Gets the TypeId of self. Read more
Source§

impl<T> Borrow<T> for T
where T: ?Sized,

Source§

fn borrow(&self) -> &T

Immutably borrows from an owned value. Read more
Source§

impl<T> BorrowMut<T> for T
where T: ?Sized,

Source§

fn borrow_mut(&mut self) -> &mut T

Mutably borrows from an owned value. Read more
Source§

impl<T> CloneToUninit for T
where T: Clone,

Source§

unsafe fn clone_to_uninit(&self, dest: *mut u8)

🔬This is a nightly-only experimental API. (clone_to_uninit)
Performs copy-assignment from self to dest. Read more
Source§

impl<T> From<T> for T

Source§

fn from(t: T) -> T

Returns the argument unchanged.

Source§

impl<T, U> Into<U> for T
where U: From<T>,

Source§

fn into(self) -> U

Calls U::from(self).

That is, this conversion is whatever the implementation of From<T> for U chooses to do.

Source§

impl<T> ToOwned for T
where T: Clone,

Source§

type Owned = T

The resulting type after obtaining ownership.
Source§

fn to_owned(&self) -> T

Creates owned data from borrowed data, usually by cloning. Read more
Source§

fn clone_into(&self, target: &mut T)

Uses borrowed data to replace owned data, usually by cloning. Read more
Source§

impl<T, U> TryFrom<U> for T
where U: Into<T>,

Source§

type Error = Infallible

The type returned in the event of a conversion error.
Source§

fn try_from(value: U) -> Result<T, <T as TryFrom<U>>::Error>

Performs the conversion.
Source§

impl<T, U> TryInto<U> for T
where U: TryFrom<T>,

Source§

type Error = <U as TryFrom<T>>::Error

The type returned in the event of a conversion error.
Source§

fn try_into(self) -> Result<U, <U as TryFrom<T>>::Error>

Performs the conversion.