chisel_lexers/
lib.rs

1//! # Overview
2//! This crate contains the lexical analysis backends used by *chisel*. The basic architecture
3//! is very simple - multiple lexers using a common scanning implementation.
4//!
5//! A *scanner* consumes characters from an underlying source of characters, and keeps track of the
6//! position where the character was read. It also provides some basic buffering and lookahead/pushback
7//! functionality.
8//!
9//! It's always assumed that input is read linearly and can only be read once, from start to finish.
10//!
11//! A *lexer* consumes from a scanner, and attempts to construct *tokens* which may be
12//! consumed by *parsers* further up the stack.
13//!
14//! A lexer defines and is capable of producing its own set of distinct tokens
15//! specific to the parsing task in hand.  (For example, the JSON lexer produces JSON-specific
16//! tokens only).
17//!
18//! ## Scanning the input
19//! The scanner operates through maintaining a simple internal state:
20//!
21//! - A current position in the input
22//! - An input buffer used to control pushbacks and lookaheads
23//! - An accumulation buffer for gathering up characters
24//!
25//! A lexer simply pulls characters through the scanner (which adds positional information to each
26//! one) and gathers them up within the accumulation buffer until it sees something that triggers
27//! the parse of a valid token.
28//!
29//! Once the lexer is ready to consume all the content in the accumulation buffer, functions are
30//! provided to extract the contents of the buffer in a number of formats (e.g. a string or char
31//! array) and to then clear the buffer without resetting all the internal scanner state.
32//!
33//! A simple example of using the scanner is shown below:
34//! ```rust
35//!  use std::io::BufReader;
36//!  use chisel_common::reader_from_bytes;
37//!  use chisel_decoders::utf8::Utf8Decoder;
38//!  use chisel_lexers::scanner::Scanner;
39//!
40//!  // construct a new scanner instance, based on a decoded byte source
41//!  let buffer: &[u8] = "let goodly sin and sunshine in".as_bytes();
42//!  let mut reader = BufReader::new(buffer);
43//!  let mut decoder = Utf8Decoder::new(&mut reader);
44//!  let mut scanner = Scanner::new(&mut decoder);
45//!  
46//! // consume the first character from the scanner...
47//! let first = scanner.advance(true);
48//! assert!(first.is_ok());
49//! assert_eq!(scanner.front().unwrap().ch, 'l');
50//! assert_eq!(scanner.front().unwrap().coords.column, 1);
51//!
52//! // consume a second character
53//! assert!(scanner.advance(true).is_ok());
54//!
55//! // ...and then pushback onto the buffer
56//! scanner.pushback();
57//!
58//! // front of the buffer should still be 'l'
59//! assert_eq!(scanner.front().unwrap().ch, 'l');
60//!
61//! // advance again - this time char will be taken from the pushback buffer
62//! let _ = scanner.advance(true);
63//! assert_eq!(scanner.front().unwrap().ch, 'e');
64//!
65//! // grab the contents of the buffer as a string
66//! let buffer_contents= scanner.buffer_as_string_with_span();
67//! assert_eq!(buffer_contents.str, String::from("le"));
68//!
69//! // reset the scanner and empty the buffer
70//! scanner.clear();
71//!
72//! // buffer should now be empty
73//! assert!(scanner.buffer_as_string_with_span().str.is_empty());
74//!
75//! // advance yet again
76//! assert!(scanner.advance(true).is_ok());
77//!
78//! // the third character read will be from the 3rd column in the input
79//! assert_eq!(scanner.front().unwrap().ch, 't');
80//! assert_eq!(scanner.front().unwrap().coords.column, 3);
81//!
82//!
83//! ```
84//!
85//! ## Lexers
86//!
87//! Within the current release, only a single lexer backend is implemented within this crate:
88//!
89//! ### JSON Lexer
90//!
91//!
92
93//!
94
95pub mod json;
96pub mod scanner;