pub struct RegexLexer<'config> { /* private fields */ }Expand description
Lexer for regular expressions.
RegexLexer is responsible for tokenizing regular expression source code into a series of tokens
that can be used by the parser. It handles all regex syntax including character classes,
quantifiers, groups, assertions, and special characters.
§Examples
Basic usage:
use oak_core::{Lexer, LexerCache, LexerState, ParseSession, SourceText};
use oak_regex::{RegexLanguage, RegexLexer};
let language = RegexLanguage::default();
let lexer = RegexLexer::new(&language);
let source = SourceText::new(r"[a-z]+\d{1,3}");
let mut cache = ParseSession::<RegexLanguage>::default();
let output = lexer.lex(&source, &[], &mut cache);
// Output contains tokens for the entire source
assert!(!output.result.unwrap().is_empty());Tokenizing different regex constructs:
use oak_core::{Lexer, LexerCache, LexerState, ParseSession, SourceText};
use oak_regex::{RegexLanguage, RegexLexer};
let language = RegexLanguage::default();
let lexer = RegexLexer::new(&language);
// Tokenize a complex regular expression
let source = SourceText::new(r"(?:(?:[a-zA-Z0-9-]+\.)+[a-zA-Z]{2,})");
let mut cache = ParseSession::<RegexLanguage>::default();
let output = lexer.lex(&source, &[], &mut cache);
// Verify tokens were generated
assert!(output.result.unwrap().len() > 5);Implementations§
Source§impl<'config> RegexLexer<'config>
impl<'config> RegexLexer<'config>
Sourcepub fn new(config: &'config RegexLanguage) -> Self
pub fn new(config: &'config RegexLanguage) -> Self
Sourcepub fn whitespace_rules(&self) -> &WhitespaceConfig
pub fn whitespace_rules(&self) -> &WhitespaceConfig
Returns the whitespace configuration for the lexer.
This method defines how the lexer should handle whitespace characters. The configuration enables Unicode whitespace support, allowing the lexer to recognize all Unicode whitespace characters, not just ASCII spaces.
Sourcepub fn comment_rules(&self) -> CommentConfig
pub fn comment_rules(&self) -> CommentConfig
Returns the comment configuration for the lexer.
This method defines how the lexer should handle comments in regular expressions.
Regular expressions typically use # as a line comment marker, with comments
continuing to the end of the line.
Sourcepub fn string_rules(&self) -> StringConfig
pub fn string_rules(&self) -> StringConfig
Returns the string literal configuration for the lexer.
This method defines how the lexer should handle string literals in regular expressions. Regex strings are typically enclosed in double quotes (“) and use backslash () as escape character.
Sourcepub fn char_rules(&self) -> StringConfig
pub fn char_rules(&self) -> StringConfig
Returns the character literal configuration for the lexer.
This method defines how the lexer should handle character literals in regular expressions. Regex character literals are enclosed in single quotes (’) and do not use escape characters in the same way as strings.
Trait Implementations§
Source§impl<'config> Clone for RegexLexer<'config>
impl<'config> Clone for RegexLexer<'config>
Source§fn clone(&self) -> RegexLexer<'config>
fn clone(&self) -> RegexLexer<'config>
1.0.0 · Source§fn clone_from(&mut self, source: &Self)
fn clone_from(&mut self, source: &Self)
source. Read more