Struct rustlr::lexer_interface::StrTokenizer
source · pub struct StrTokenizer<'t> {
pub keep_whitespace: bool,
pub keep_newline: bool,
pub keep_comment: bool,
pub line_positions: Vec<usize>,
pub specialeof: &'static str,
pub tab_spaces: usize,
pub allow_newline_in_string: bool,
pub priority_symbols: BTreeMap<&'static str, u32>,
/* private fields */
}
Expand description
General-purpose, zero-copy lexical analyzer that produces RawTokens from an str. This tokenizer uses regex, although not for everything. For example, to allow for string literals that contain escaped quotations, a direct loop is implemented. The tokenizer gives the option of returning newlines, whitespaces (with count) and comments as special tokens. It recognizes mult-line string literals, multi-line as well as single-line comments, and returns the starting line and column positions of each token.
Example:
let mut scanner = StrTokenizer::from_str("while (1) fork();//run at your own risk");
scanner.set_line_comment("//");
scanner.keep_comment=true;
scanner.add_single(';'); // separates ; from following symbols
while let Some(token) = scanner.next() {
println!("Token,line,column: {:?}",&token);
}
this code produces output
Token,line,column: (Alphanum("while"), 1, 1)
Token,line,column: (Symbol("("), 1, 7)
Token,line,column: (Num(1), 1, 8)
Token,line,column: (Symbol(")"), 1, 9)
Token,line,column: (Alphanum("fork"), 1, 11)
Token,line,column: (Symbol("("), 1, 15)
Token,line,column: (Symbol(")"), 1, 16)
Token,line,column: (Symbol(";"), 1, 17)
Token,line,column: (Verbatim("//run at your own risk"), 1, 18)
Fields§
§keep_whitespace: bool
flag to toggle whether whitespaces should be returned as Whitespace tokens, default is false.
keep_newline: bool
flag to toggle whether newline characters (‘\n’) are returned as Newline
tokens. Default is false. Note that if this flag is set to true then
newline characters are treated differently from other whitespaces.
For example, when parsing languages like Python, both keep_whitespace
and keep_newline should be set to true. Change option in grammar with
lexattribute keep_newline=true
keep_comment: bool
flag to determine if comments are kept and returned as Verbatim tokens, default is false.
line_positions: Vec<usize>
vector of starting byte position of each line, position 0 not used.
specialeof: &'static str
§tab_spaces: usize
number of whitespaces to count for each tab (default 6). This can be
changed with a declaration such as lexattribute tab_spaces=8
. Do
not set this value to zero.
allow_newline_in_string: bool
allows string literals to contain non-escaped newline characters: warning: changing the default (false) may reduce the accuracy of error reporting.
priority_symbols: BTreeMap<&'static str, u32>
Multiset of verbatim symbols that have priority over other categories; sorted by string order. The multiset is implemented as a map from strings to counts.
Implementations§
source§impl<'t> StrTokenizer<'t>
impl<'t> StrTokenizer<'t>
sourcepub fn new() -> StrTokenizer<'t> ⓘ
pub fn new() -> StrTokenizer<'t> ⓘ
creats a new tokenizer with defaults, does not set input.
sourcepub fn map<G, FM: FnOnce(&mut StrTokenizer<'t>) -> G>(&mut self, f: FM) -> G
pub fn map<G, FM: FnOnce(&mut StrTokenizer<'t>) -> G>(&mut self, f: FM) -> G
applies closure to self, can be used together with lexconditional to invoke custom actions
sourcepub fn current_text(&self) -> &'t str
pub fn current_text(&self) -> &'t str
returns text of the current token, untrimed
sourcepub fn add_double(&mut self, s: &'t str)
pub fn add_double(&mut self, s: &'t str)
adds a symbol of exactly length two. If the length is not two the function has no effect. Note that these symbols override all other types except for leading whitespaces and comments markers, e.g. “//” will have precedence over “/” and “==” will have precedence over “=”.
sourcepub fn add_single(&mut self, c: char)
pub fn add_single(&mut self, c: char)
add a single-character symbol. The type of the symbol overrides other types except for whitespaces, comments and double-character symbols.
sourcepub fn add_triple(&mut self, s: &'t str)
pub fn add_triple(&mut self, s: &'t str)
add a 3-character symbol
sourcepub fn add_priority_symbol(&mut self, s: &'static str)
pub fn add_priority_symbol(&mut self, s: &'static str)
multiset-add a verbatim string as a priority symbol: will be returned
as Symbol(s)
sourcepub fn del_priority_symbol(&mut self, s: &'static str)
pub fn del_priority_symbol(&mut self, s: &'static str)
multiset-remove verbative string as a priority symbol
sourcepub fn skip_to(&mut self, target: &'static str)
pub fn skip_to(&mut self, target: &'static str)
Skips to last occurrence of target string, or to end of input. Returns RawToken::Skipto token.
sourcepub fn skip_reset(&mut self)
pub fn skip_reset(&mut self)
cancels recoginition of skip_to (called internally)
sourcepub fn skip_match(
&mut self,
lbr: &'static str,
rbr: &'static str,
offset: i32,
delimit: &'static str,
)
pub fn skip_match( &mut self, lbr: &'static str, rbr: &'static str, offset: i32, delimit: &'static str, )
StrTokenizer can do a little more than recognize just regular
expressions. It can detect matching brackets, and return
return the bracket-enclosed text as a RawToken::Skipto token.
An offset of 1 is recommended, as this call is usually made
after an instance of the opening left-bracket is seen as lookahead.
The operation increases a counter, starting with the offset everytime
a left-bracket is seen and decreases it with every right-bracket, until
counter==0, at which point it returns the skipped text in a
RawToken::Skipmatched token. It will top searching when the delimit
string is reached. If delimit
is
the empty string, then it will search until the end of input.
sourcepub fn add_custom(&mut self, tkind: &'static str, reg_expr: &str)
pub fn add_custom(&mut self, tkind: &'static str, reg_expr: &str)
add custom defined regex, will correspond to RawToken::Custom variant. Custom regular expressions should not start with whitespaces and will override all others. Multiple Custom types will be matched by the order in which they where declared in the grammar file.
sourcepub fn set_input(&mut self, inp: &'t str)
pub fn set_input(&mut self, inp: &'t str)
sets the input str to be parsed, resets position information. Note: trailing whitespaces are always trimmed from the input.
sourcepub fn set_line_comment(&mut self, cm: &'t str)
pub fn set_line_comment(&mut self, cm: &'t str)
sets the symbol that begins a single-line comment. The default is “//”. If this is set to the empty string then no line-comments are recognized.
sourcepub fn set_multiline_comments(&mut self, cm: &'t str)
pub fn set_multiline_comments(&mut self, cm: &'t str)
sets the symbols used to delineate multi-line comments using a whitespace separated string such as “/* */”. These symbols are also the default. Set this to the empty string to disable multi-line comments.
sourcepub fn current_position(&self) -> usize
pub fn current_position(&self) -> usize
returns the current absolute byte position of the Tokenizer
sourcepub fn previous_position(&self) -> usize
pub fn previous_position(&self) -> usize
returns the previous absolute byte position of the Tokenizer
sourcepub fn get_source(&self) -> &str
pub fn get_source(&self) -> &str
returns the source of the tokenizer such as URL or filename
pub fn set_source<'u: 't>(&mut self, s: &'u str)
sourcepub fn current_line(&self) -> &str
pub fn current_line(&self) -> &str
gets the current line of the source input
sourcepub fn get_line(&self, i: usize) -> Option<&str>
pub fn get_line(&self, i: usize) -> Option<&str>
Retrieves the ith line of the raw input, if line index i is valid. This function is intended to be called once the tokenizer has completed its task of scanning and tokenizing the entire input. Otherwise, it may return None if the tokenizer has not yet scanned up to the line indicated. That is, it is intended for error message generation when evaluating the AST post-parsing.
sourcepub fn get_slice(&self, start: usize, end: usize) -> &str
pub fn get_slice(&self, start: usize, end: usize) -> &str
Retrieves the source string slice at the indicated indices; returns the empty string if indices are invalid. The default implementation returns the empty string.
pub fn backtrack(&mut self, offset: usize)
source§impl<'t> StrTokenizer<'t>
impl<'t> StrTokenizer<'t>
sourcepub fn from_source(ls: &'t LexSource<'t>) -> StrTokenizer<'t> ⓘ
pub fn from_source(ls: &'t LexSource<'t>) -> StrTokenizer<'t> ⓘ
creates a StrTokenizer from a LexSource structure that contains a string representing the contents of the source, and calls StrTokenizer::set_input to reference that string. To create a tokenizer that reads from, for example, a file is:
let source = LexSource::new(source_path).unwrap();
let mut tokenizer = StrTokenizer::from_source(&source);
sourcepub fn from_str(s: &'t str) -> StrTokenizer<'t> ⓘ
pub fn from_str(s: &'t str) -> StrTokenizer<'t> ⓘ
creates a string tokenizer and sets input to give str.
Trait Implementations§
source§impl<'t> Iterator for StrTokenizer<'t>
impl<'t> Iterator for StrTokenizer<'t>
source§fn next(&mut self) -> Option<(RawToken<'t>, usize, usize)>
fn next(&mut self) -> Option<(RawToken<'t>, usize, usize)>
source§fn next_chunk<const N: usize>(
&mut self,
) -> Result<[Self::Item; N], IntoIter<Self::Item, N>>where
Self: Sized,
fn next_chunk<const N: usize>(
&mut self,
) -> Result<[Self::Item; N], IntoIter<Self::Item, N>>where
Self: Sized,
iter_next_chunk
)N
values. Read more1.0.0 · source§fn size_hint(&self) -> (usize, Option<usize>)
fn size_hint(&self) -> (usize, Option<usize>)
1.0.0 · source§fn count(self) -> usizewhere
Self: Sized,
fn count(self) -> usizewhere
Self: Sized,
1.0.0 · source§fn last(self) -> Option<Self::Item>where
Self: Sized,
fn last(self) -> Option<Self::Item>where
Self: Sized,
source§fn advance_by(&mut self, n: usize) -> Result<(), NonZero<usize>>
fn advance_by(&mut self, n: usize) -> Result<(), NonZero<usize>>
iter_advance_by
)n
elements. Read more1.0.0 · source§fn nth(&mut self, n: usize) -> Option<Self::Item>
fn nth(&mut self, n: usize) -> Option<Self::Item>
n
th element of the iterator. Read more1.28.0 · source§fn step_by(self, step: usize) -> StepBy<Self>where
Self: Sized,
fn step_by(self, step: usize) -> StepBy<Self>where
Self: Sized,
1.0.0 · source§fn chain<U>(self, other: U) -> Chain<Self, <U as IntoIterator>::IntoIter>
fn chain<U>(self, other: U) -> Chain<Self, <U as IntoIterator>::IntoIter>
1.0.0 · source§fn zip<U>(self, other: U) -> Zip<Self, <U as IntoIterator>::IntoIter>where
Self: Sized,
U: IntoIterator,
fn zip<U>(self, other: U) -> Zip<Self, <U as IntoIterator>::IntoIter>where
Self: Sized,
U: IntoIterator,
source§fn intersperse_with<G>(self, separator: G) -> IntersperseWith<Self, G>
fn intersperse_with<G>(self, separator: G) -> IntersperseWith<Self, G>
iter_intersperse
)separator
between adjacent items of the original iterator. Read more1.0.0 · source§fn map<B, F>(self, f: F) -> Map<Self, F>
fn map<B, F>(self, f: F) -> Map<Self, F>
1.0.0 · source§fn filter<P>(self, predicate: P) -> Filter<Self, P>
fn filter<P>(self, predicate: P) -> Filter<Self, P>
1.0.0 · source§fn filter_map<B, F>(self, f: F) -> FilterMap<Self, F>
fn filter_map<B, F>(self, f: F) -> FilterMap<Self, F>
1.0.0 · source§fn enumerate(self) -> Enumerate<Self>where
Self: Sized,
fn enumerate(self) -> Enumerate<Self>where
Self: Sized,
1.0.0 · source§fn skip_while<P>(self, predicate: P) -> SkipWhile<Self, P>
fn skip_while<P>(self, predicate: P) -> SkipWhile<Self, P>
1.0.0 · source§fn take_while<P>(self, predicate: P) -> TakeWhile<Self, P>
fn take_while<P>(self, predicate: P) -> TakeWhile<Self, P>
1.57.0 · source§fn map_while<B, P>(self, predicate: P) -> MapWhile<Self, P>
fn map_while<B, P>(self, predicate: P) -> MapWhile<Self, P>
1.0.0 · source§fn skip(self, n: usize) -> Skip<Self>where
Self: Sized,
fn skip(self, n: usize) -> Skip<Self>where
Self: Sized,
n
elements. Read more1.0.0 · source§fn take(self, n: usize) -> Take<Self>where
Self: Sized,
fn take(self, n: usize) -> Take<Self>where
Self: Sized,
n
elements, or fewer
if the underlying iterator ends sooner. Read more1.0.0 · source§fn flat_map<U, F>(self, f: F) -> FlatMap<Self, U, F>
fn flat_map<U, F>(self, f: F) -> FlatMap<Self, U, F>
source§fn map_windows<F, R, const N: usize>(self, f: F) -> MapWindows<Self, F, N>
fn map_windows<F, R, const N: usize>(self, f: F) -> MapWindows<Self, F, N>
iter_map_windows
)f
for each contiguous window of size N
over
self
and returns an iterator over the outputs of f
. Like slice::windows()
,
the windows during mapping overlap as well. Read more1.0.0 · source§fn inspect<F>(self, f: F) -> Inspect<Self, F>
fn inspect<F>(self, f: F) -> Inspect<Self, F>
1.0.0 · source§fn by_ref(&mut self) -> &mut Selfwhere
Self: Sized,
fn by_ref(&mut self) -> &mut Selfwhere
Self: Sized,
source§fn collect_into<E>(self, collection: &mut E) -> &mut E
fn collect_into<E>(self, collection: &mut E) -> &mut E
iter_collect_into
)1.0.0 · source§fn partition<B, F>(self, f: F) -> (B, B)
fn partition<B, F>(self, f: F) -> (B, B)
source§fn is_partitioned<P>(self, predicate: P) -> bool
fn is_partitioned<P>(self, predicate: P) -> bool
iter_is_partitioned
)true
precede all those that return false
. Read more1.27.0 · source§fn try_fold<B, F, R>(&mut self, init: B, f: F) -> R
fn try_fold<B, F, R>(&mut self, init: B, f: F) -> R
1.27.0 · source§fn try_for_each<F, R>(&mut self, f: F) -> R
fn try_for_each<F, R>(&mut self, f: F) -> R
1.0.0 · source§fn fold<B, F>(self, init: B, f: F) -> B
fn fold<B, F>(self, init: B, f: F) -> B
1.51.0 · source§fn reduce<F>(self, f: F) -> Option<Self::Item>
fn reduce<F>(self, f: F) -> Option<Self::Item>
source§fn try_reduce<R>(
&mut self,
f: impl FnMut(Self::Item, Self::Item) -> R,
) -> <<R as Try>::Residual as Residual<Option<<R as Try>::Output>>>::TryType
fn try_reduce<R>( &mut self, f: impl FnMut(Self::Item, Self::Item) -> R, ) -> <<R as Try>::Residual as Residual<Option<<R as Try>::Output>>>::TryType
iterator_try_reduce
)1.0.0 · source§fn all<F>(&mut self, f: F) -> bool
fn all<F>(&mut self, f: F) -> bool
1.0.0 · source§fn any<F>(&mut self, f: F) -> bool
fn any<F>(&mut self, f: F) -> bool
1.0.0 · source§fn find<P>(&mut self, predicate: P) -> Option<Self::Item>
fn find<P>(&mut self, predicate: P) -> Option<Self::Item>
1.30.0 · source§fn find_map<B, F>(&mut self, f: F) -> Option<B>
fn find_map<B, F>(&mut self, f: F) -> Option<B>
source§fn try_find<R>(
&mut self,
f: impl FnMut(&Self::Item) -> R,
) -> <<R as Try>::Residual as Residual<Option<Self::Item>>>::TryType
fn try_find<R>( &mut self, f: impl FnMut(&Self::Item) -> R, ) -> <<R as Try>::Residual as Residual<Option<Self::Item>>>::TryType
try_find
)1.0.0 · source§fn position<P>(&mut self, predicate: P) -> Option<usize>
fn position<P>(&mut self, predicate: P) -> Option<usize>
1.6.0 · source§fn max_by_key<B, F>(self, f: F) -> Option<Self::Item>
fn max_by_key<B, F>(self, f: F) -> Option<Self::Item>
1.15.0 · source§fn max_by<F>(self, compare: F) -> Option<Self::Item>
fn max_by<F>(self, compare: F) -> Option<Self::Item>
1.6.0 · source§fn min_by_key<B, F>(self, f: F) -> Option<Self::Item>
fn min_by_key<B, F>(self, f: F) -> Option<Self::Item>
1.15.0 · source§fn min_by<F>(self, compare: F) -> Option<Self::Item>
fn min_by<F>(self, compare: F) -> Option<Self::Item>
1.0.0 · source§fn unzip<A, B, FromA, FromB>(self) -> (FromA, FromB)
fn unzip<A, B, FromA, FromB>(self) -> (FromA, FromB)
1.36.0 · source§fn copied<'a, T>(self) -> Copied<Self>
fn copied<'a, T>(self) -> Copied<Self>
source§fn array_chunks<const N: usize>(self) -> ArrayChunks<Self, N>where
Self: Sized,
fn array_chunks<const N: usize>(self) -> ArrayChunks<Self, N>where
Self: Sized,
iter_array_chunks
)N
elements of the iterator at a time. Read more1.11.0 · source§fn product<P>(self) -> P
fn product<P>(self) -> P
source§fn cmp_by<I, F>(self, other: I, cmp: F) -> Ordering
fn cmp_by<I, F>(self, other: I, cmp: F) -> Ordering
iter_order_by
)Iterator
with those
of another with respect to the specified comparison function. Read more1.5.0 · source§fn partial_cmp<I>(self, other: I) -> Option<Ordering>
fn partial_cmp<I>(self, other: I) -> Option<Ordering>
PartialOrd
elements of
this Iterator
with those of another. The comparison works like short-circuit
evaluation, returning a result without comparing the remaining elements.
As soon as an order can be determined, the evaluation stops and a result is returned. Read moresource§fn partial_cmp_by<I, F>(self, other: I, partial_cmp: F) -> Option<Ordering>where
Self: Sized,
I: IntoIterator,
F: FnMut(Self::Item, <I as IntoIterator>::Item) -> Option<Ordering>,
fn partial_cmp_by<I, F>(self, other: I, partial_cmp: F) -> Option<Ordering>where
Self: Sized,
I: IntoIterator,
F: FnMut(Self::Item, <I as IntoIterator>::Item) -> Option<Ordering>,
iter_order_by
)Iterator
with those
of another with respect to the specified comparison function. Read moresource§fn eq_by<I, F>(self, other: I, eq: F) -> bool
fn eq_by<I, F>(self, other: I, eq: F) -> bool
iter_order_by
)1.5.0 · source§fn lt<I>(self, other: I) -> bool
fn lt<I>(self, other: I) -> bool
Iterator
are lexicographically
less than those of another. Read more1.5.0 · source§fn le<I>(self, other: I) -> bool
fn le<I>(self, other: I) -> bool
Iterator
are lexicographically
less or equal to those of another. Read more1.5.0 · source§fn gt<I>(self, other: I) -> bool
fn gt<I>(self, other: I) -> bool
Iterator
are lexicographically
greater than those of another. Read more1.5.0 · source§fn ge<I>(self, other: I) -> bool
fn ge<I>(self, other: I) -> bool
Iterator
are lexicographically
greater than or equal to those of another. Read moresource§fn is_sorted_by<F>(self, compare: F) -> bool
fn is_sorted_by<F>(self, compare: F) -> bool
is_sorted
)source§fn is_sorted_by_key<F, K>(self, f: F) -> bool
fn is_sorted_by_key<F, K>(self, f: F) -> bool
is_sorted
)