Documentation
This complete Lexer/Lexical Scanner produces over 115+ tokens for a string or a file path entry. The output is a Vector for the user to handle according to their needs. All tokens are included (including whitespace) as that is left to the user to decide how to use the tokens.
If you have any questions, comments, or need help, feel free to add a discussion here:
https://github.com/mjehrhart/lexical_scanner/discussions
To see an example of an output, check out the wiki page:
https://github.com/mjehrhart/lexical_scanner/wiki/Example
Setup
Add this depedency to TOML
lexical_scanner = "0.1.17"
Basic Usage
The two ways to perform a lexical scan is to pass in file path or pass in a string. Passing in a string is mostly used for testing while passing in a file path is common for every day work. A lexical scanner can produces thousands and thousands of tokens very quickly. For this reason, it is best to use a file path.
use lexical_scanner;
That is all there is to do! The lexical scanner returns a Vec for the user to handle as needed.
To test with a string, all you need to do is call this
use lexical_scanner;
Below is a simple way to view the tokens for unit testing:
for in token_list.iter.enumerate
output 0. Word
1. WhiteSpace
2. Word
3. WhiteSpace
4. Floating
5. WhiteSpace
6. Word
7. WhiteSpace
8. Gt
9. WhiteSpace
10. Numeric
11. Semi
Custom keywords
There is a way to add in your own keyword identifiers. Doing so will help manage parsing of the tokens.
use lexical_scanner;
Below is a simple way to view the tokens for unit testing:
for in token_list.iter.enumerate
output 0. Word
1. WhiteSpace
2. Word
3. WhiteSpace
4. Floating
5. WhiteSpace
6. Word
7. WhiteSpace
8. KW_UserDefined
9. WhiteSpace
10. Word
11. WhiteSpace
12. Word
13. WhiteSpace
14. KW_UserDefined
15. WhiteSpace
16. Word
17. WhiteSpace
18. Word
19. WhiteSpace
20. KW_UserDefined
21. WhiteSpace
22. Word
23. WhiteSpace
24. KW_UserDefined
25. WhiteSpace
26. Numeric
27. Semi
Supported Tokens
& => And,
&& => AndAnd,
&= => AndEq,
@ => At,
\ => Backslash,
BitCharacterCode7(String),
BitCharacterCode8(String),
/* => BlockCommentStart(String),
*/ => BlockCommentStop(String),
[ => BracketLeft,
] => BracketRight,
b'H' => Byte(String),
b"Hello" => ByteString(String),
^ => Caret,
^ => CaretEq,
\r\n => CarriageReturn,
Character(String),
: => Colon,
, => Comma,
{ => CurlyBraceLeft,
} => CurlyBraceRight,
$ => Dollar,
. => Dot,
.. => DotDot,
... => DotDotDot,
..= => DotDotEq,
" => DoubleQuote,
= => Eq,
== => EqEq,
>= => Ge,
> => Gt,
=> => FatArrow,
//! => InnerBlockDoc(String),
/*! => InnerLineDoc(String),
< => Le,
// => LineComment(String),
<= Lt,
- => Minus,
-= => MinusEq,
| => Or,
|= => OrEq,
|| => OrOr,
/** => OuterBlockDoc(String),
/// => OuterLineDoc(String),
\n => Newline,
! => Not,
!= => NotEq,
Null,
3.14 => Floating(String),
314 => Numeric(String),
( => ParenLeft,
) => ParenRight,
:: => PathSep,
% => Percent,
%= => PercentEq,
+ => Plus,
+= => PlusEq,
# => Pound,
? => Question,
-> => RArrow,
r#"Hello"# => RawString(String),
rb#"Hello"# => RawByteString(String),
; => Semi,
<< => Shl,
<<= => ShlEq,
>> => Shr,
>>= => ShrEq,
' => SingleQuote,
/ => Slash,
/= => SlashEq,
* => Star,
*= => StarEq,
Stopped(String), //for debugging
"Hello" => String(String),
\t => Tab,
Undefined,
_ => Underscore,
' ' => WhiteSpace,
Word(String),
KW_As,
KW_Async,
KW_Await,
KW_Break,
KW_Const,
KW_Contine,
KW_Crate,
KW_Dyn,
KW_Else,
KW_Enum,
KW_Extern,
KW_False,
KW_Fn,
KW_For,
KW_If,
KW_Impl,
KW_In,
KW_Let,
KW_Loop,
KW_Match,
KW_Mod,
KW_Move,
KW_Mut,
KW_Pub,
KW_Ref,
KW_Return,
KW_SELF,
KW_Self,
KW_Static,
KW_Struct,
KW_Super,
KW_Trait,
KW_True,
KW_Type,
KW_Union,
KW_Unsafe,
KW_Use,
KW_UserDefined(String),
KW_Where,
KW_While,
crates.io => https://crates.io/crates/lexical_scanner github.com => https://github.com/mjehrhart/lexical_scanner