lexer_rs/
lib.rs

1//a Documentation
2#![warn(missing_docs)]
3// #![warn(missing_doc_code_examples)]
4/*!
5
6# Lexer library
7
8This library provides a generic mechanism for parsing data into
9streams of tokens.
10
11This is commonly used in human-readable language compilers and
12interpreters, to convert from a text stream into values that can then
13be parsed according to the grammar of that language.ยง
14
15A simple example would be for a calculator that operates on a stream
16of numbers and mathematical symbols; the first step of processing that
17the calculator must do is to convert the text stream into abstract
18tokens such as 'the number 73' and 'the plus sign'. Once the
19calculator has such tokens it can piece them together into a real
20expression that it can then evaluate.
21
22## Basic concept
23
24The basic concept of a lexer is to convert a stream of (e.g.) [char]
25into a stream of 'Token' - which will be specific to the lexer. The
26lexer starts at the beginning of the text, and moves through consuming
27characters into tokens.
28
29## Lexer implementations
30
31A lexer is not difficult to implement, and there are many alternative
32approaches to doing so. A very simple approach for a [String] would be
33to have a loop that matches the start of the string with possible
34token values (perhaps using a regular expression), and on finding a
35match it can 'trim' the front of the String, yield the token, and then
36loop again.
37
38This library provides an implementation option that gives the ability
39to provide good error messages when things go wrong; it provides a
40trait that allows abstraction of the lexer from the consumer (so that
41one can get streams of tokens from a String, a BufRead, etc.); it
42provides the infrastructure for any lexer using a simple mechanism for
43parsing tokens.
44
45# Positions in files
46
47The crate provides some mechanisms for tracking the position of
48parsing within a stream, so that error messages can be appropriately
49crafted for the end user.
50
51Tracking the position as a minimum is following the byte offset within
52the file; additionally the line number and column number can also be
53tracked. The [UserPosn] trait provides for this.
54
55As Rust utilizes UTF8 encoded strings, not all byte offsets correspond
56to actual [char]s in a stream, and the column separation between two
57characters is not the difference between their byte offsets. The
58[PosnInCharStream] adds to the [UserPosn] trait to manage this.
59
60The bare minimum for a lexer handling UTF8-encoded strings does not
61require tracking of lines and columns; only the byte offset tracking
62*has* to be used; using a [usize] as the [PosnInCharStream]
63implementation provides for this (as the *byte offset* within a [str].
64
65The [Lexer] trait thus has an associated stream position type (its
66'State'): this must be lightweight as it is moved around and copied
67frequently, and must be static.
68
69# Tokens
70
71The token type that the [Lexer] trait produces from its parsing is
72supplied by the client; this is normally a simple enumeration.
73
74The parsing is managed by the [Lexer] with the client providing a
75slice of matching functions; each matching function is applied in
76turn, and the first that returns an Ok of a Some of a token yields the
77token and advances the parsing state. The parsers can generate an
78error if they detect a real error in the stream (not just a mismatch
79to their token type).
80
81# Error reporting
82
83With the file position handling used within the [Lexer] it is possible
84to display contextual error information - so if the whole text is
85retained by the [Lexer] then an error can be displayed with the text
86from the source with the error point/region highlighted.
87
88Support for this is provided by the [FmtContext] trait, which is
89implemented particularly for [LexerOfString].
90
91!*/
92
93//a Imports
94mod char_stream;
95mod fmt_context;
96mod lexer;
97mod posn_and_span;
98
99//a Exports
100pub use char_stream::CharStream;
101pub use fmt_context::FmtContext;
102
103pub use posn_and_span::LineColumn;
104pub use posn_and_span::StreamCharPos;
105pub use posn_and_span::StreamCharSpan;
106pub use posn_and_span::{PosnInCharStream, UserPosn};
107
108pub use crate::lexer::LexerOfStr;
109pub use crate::lexer::LexerOfString;
110pub use crate::lexer::ParserIterator;
111pub use crate::lexer::SimpleParseError;
112pub use crate::lexer::{BoxDynLexerParseFn, Lexer, LexerError, LexerParseFn, LexerParseResult};