1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96
//! # Overview
//! This crate contains the lexical analysis backends used by *chisel*. The basic architecture
//! is very simple - multiple lexers using a common scanning implementation.
//!
//! A *scanner* consumes characters from an underlying source of characters, and keeps track of the
//! position where the character was read. It also provides some basic buffering and lookahead/pushback
//! functionality.
//!
//! It's always assumed that input is read linearly and can only be read once, from start to finish.
//!
//! A *lexer* consumes from a scanner, and attempts to construct *tokens* which may be
//! consumed by *parsers* further up the stack.
//!
//! A lexer defines and is capable of producing its own set of distinct tokens
//! specific to the parsing task in hand. (For example, the JSON lexer produces JSON-specific
//! tokens only).
//!
//! ## Scanning the input
//! The scanner operates through maintaining a simple internal state:
//!
//! - A current position in the input
//! - An input buffer used to control pushbacks and lookaheads
//! - An accumulation buffer for gathering up characters
//!
//! A lexer simply pulls characters through the scanner (which adds positional information to each
//! one) and gathers them up within the accumulation buffer until it sees something that triggers
//! the parse of a valid token.
//!
//! Once the lexer is ready to consume all the content in the accumulation buffer, functions are
//! provided to extract the contents of the buffer in a number of formats (e.g. a string or char
//! array) and to then clear the buffer without resetting all the internal scanner state.
//!
//! A simple example of using the scanner is shown below:
//! ```rust
//! use std::io::BufReader;
//! use chisel_common::reader_from_bytes;
//! use chisel_decoders::utf8::Utf8Decoder;
//! use chisel_lexers::scanner::Scanner;
//!
//! // construct a new scanner instance, based on a decoded byte source
//! let buffer: &[u8] = "let goodly sin and sunshine in".as_bytes();
//! let mut reader = BufReader::new(buffer);
//! let mut decoder = Utf8Decoder::new(&mut reader);
//! let mut scanner = Scanner::new(&mut decoder);
//!
//! // consume the first character from the scanner...
//! let first = scanner.advance(true);
//! assert!(first.is_ok());
//! assert_eq!(scanner.front().unwrap().ch, 'l');
//! assert_eq!(scanner.front().unwrap().coords.column, 1);
//!
//! // consume a second character
//! assert!(scanner.advance(true).is_ok());
//!
//! // ...and then pushback onto the buffer
//! scanner.pushback();
//!
//! // front of the buffer should still be 'l'
//! assert_eq!(scanner.front().unwrap().ch, 'l');
//!
//! // advance again - this time char will be taken from the pushback buffer
//! let _ = scanner.advance(true);
//! assert_eq!(scanner.front().unwrap().ch, 'e');
//!
//! // grab the contents of the buffer as a string
//! let buffer_contents= scanner.buffer_as_string_with_span();
//! assert_eq!(buffer_contents.str, String::from("le"));
//!
//! // reset the scanner and empty the buffer
//! scanner.clear();
//!
//! // buffer should now be empty
//! assert!(scanner.buffer_as_string_with_span().str.is_empty());
//!
//! // advance yet again
//! assert!(scanner.advance(true).is_ok());
//!
//! // the third character read will be from the 3rd column in the input
//! assert_eq!(scanner.front().unwrap().ch, 't');
//! assert_eq!(scanner.front().unwrap().coords.column, 3);
//!
//!
//! ```
//!
//! ## Lexers
//!
//! Within the current release, only a single lexer backend is implemented within this crate:
//!
//! ### JSON Lexer
//!
//!
//!
pub mod json;
pub mod scanner;