ParserCore

Struct ParserCore 

Source
pub struct ParserCore { /* private fields */ }
Expand description

The parser core.

This struct provides all the basic parsing primitives used elsewhere. To use it, make a decoder and pass it, along with a name for the source, to ParserCore::new().

In general you will probably prefer to use Parser instead, which will provide all the functionality of the core, plus additional helper methods.

This struct exists to break a dependency cycle in the architecture.

Implementations§

Source§

impl ParserCore

Source

pub fn new<D: Decoder + 'static>(name: &str, decoder: D) -> Self

Create a new parser using the given decoder as the source of characters. A name is given that will be used when creating Loc instances.

Source

pub fn loc(&self) -> Loc

Get the current location in the parse. This will return either a console (if the name is the empty string) or a file location (if the name was not the empty string).

Source

pub fn get_column_number(&self) -> usize

Get the current one-based column number. This may be useful when parsing languages in which indentation is significant, but otherwise you will probably prefer to use Self::loc().

Source

pub fn get_line_number(&self) -> usize

Get the current one-based line number. For most uses you will probably find Self::loc() to be more useful.

Source

pub fn replace_whitespace_test( &mut self, test: Box<dyn Fn(char) -> bool>, ) -> Box<dyn Fn(char) -> bool>

Define whitespace. This takes a closure that returns true for whitespace and false otherwise. The prior whitespace test is returned.

Examples found in repository?
examples/book_json_parser.rs (line 139)
122pub fn main() {
123    let mut parser = trivet::parse_from_stdin();
124    parser.parse_comments = false;
125    let numpar = parser.borrow_number_parser();
126    numpar.settings.permit_binary = false;
127    numpar.settings.permit_hexadecimal = false;
128    numpar.settings.permit_octal = false;
129    numpar.settings.permit_underscores = false;
130    numpar.settings.decimal_only_floats = true;
131    numpar.settings.permit_plus = false;
132    numpar.settings.permit_leading_zero = false;
133    numpar.settings.permit_empty_whole = false;
134    numpar.settings.permit_empty_fraction = false;
135    let strpar = parser.borrow_string_parser();
136    strpar.set(trivet::strings::StringStandard::JSON);
137    let _ = parser
138        .borrow_core()
139        .replace_whitespace_test(Box::new(|ch| [' ', '\n', '\r', '\t'].contains(&ch)));
140    parser.consume_ws();
141    let result = parse_value_ws(&mut parser);
142    match result {
143        Err(error) => {
144            println!("ERROR: {}", error);
145            std::process::exit(1);
146        }
147        Ok(json) => {
148            // If there is any trailing stuff that is not whitespace, then this is not a valid
149            // JSON file.
150            if parser.is_at_eof() {
151                // Print the JSON value.
152                println!("{:?}", json);
153                std::process::exit(0)
154            } else {
155                println!("Found unexpected trailing characters after JSON value.");
156                std::process::exit(1);
157            }
158        }
159    }
160}
Source

pub fn is_at_eof(&self) -> bool

Determine if the parser has reached the end of the stream. If this is true, then no further characters are available from this parser.

Source

pub fn peek(&mut self) -> char

Peek at the next character in the stream. In order to be as fast as is reasonable, no stream checking is done. If the stream is at the end, then you should get null characters, but you should not rely on that, since the null is also a valid character in a file. Instead, be sure to check Self::is_at_eof.

If this method is invoked too many times without any characters being consumed, then it will panic to indicate that parsing has stalled. See PEEK_LIMIT.

Source

pub fn consume(&mut self)

Consume the next character from the stream, if there is one. If not, then do nothing.

If this method is invoked too many times after reaching the end of file, then it will panic to indicate that parsing has stalled. See EOF_LIMIT.

Source

pub fn peek_offset(&mut self, n: usize) -> char

Peek at an offset in the stream. That is, peek at a character at a given position. The position index is zero-based, with the next character to read (the result of a simple Self::peek) being at index zero.

If there are not enough characters in the stream, then null (\0) is returned. The distance is limited by the maximum lookahead; attempts to look past it will also return a null.

Note the distinction between this method and Self::peek_n; peek_n(1) method will return the character at position zero, so it is equivalent to peek() and to peek_offset(0).

Source

pub fn peek_n_vec(&mut self, n: usize) -> Vec<char>

Peek at characters in the stream. If there are fewer than n characters in the stream, then fewer are returned. If the stream is exhausted, an empty vector is returned.

If this method is invoked too many times without any characters being consumed, then it will panic to indicate that parsing has stalled. See PEEK_LIMIT.

This method is similar to Self::peek_n, but does not construct a string for the result, which can be better in some cases.

Source

pub fn peek_n(&mut self, n: usize) -> String

Peek at characters in the stream. If there are fewer than n characters in the stream, then fewer are returned. If the stream is exhausted, an empty string is returned.

If this method is invoked too many times without any characters being consumed, then it will panic to indicate that parsing has stalled. See PEEK_LIMIT.

Source

pub fn consume_n(&mut self, n: usize)

Consume a given number of characters from the stream. The end of file is not checked during this. If there are no characters to consume, nothing is done.

If this method is invoked too many times after reaching the end of file, it will panic to indicate that parsing has stalled. See EOF_LIMIT.

Source

pub fn peek_chars(&mut self, chars: &[char]) -> bool

Check the next characters in the stream. If the next characters exactly match those given in the vector, in order, then true is returned. Otherwise false is returned. Nothing is consumed.

Source

pub fn peek_and_consume(&mut self, ch: char) -> bool

Peek at the next character in the stream. If it is the given character, consume it and return true. Otherwise return false.

Source

pub fn peek_and_consume_chars(&mut self, chars: &[char]) -> bool

Check the next characters in the stream and, if they match in order, consume them and return true. Otherwise return false.

Examples found in repository?
examples/book_primitives_lua.rs (line 14)
1fn main() {
2    // Text to parse.  Note that the comment ends on line 5 at column 12, with
3    // the first non-comment position at column 13.
4    let mut parser = trivet::parse_from_string(
5        r#"
6        --[[
7            I am a long form
8            Lua comment.
9        --]]"#,
10    );
11    parser.borrow_comment_parser().enable_c = false;
12    parser.borrow_comment_parser().enable_cpp = false;
13    parser.borrow_comment_parser().custom = Box::new(|parser: &mut trivet::ParserCore| -> bool {
14        if parser.peek_and_consume_chars(&['-', '-', '[', '[']) {
15            parser.take_until("--]]");
16            true
17        } else if parser.peek_and_consume_chars(&['-', '-']) {
18            parser.take_while(|ch| ch != '\n');
19            true
20        } else {
21            false
22        }
23    });
24    parser.borrow_comment_parser().enable_custom = true;
25    parser.consume_ws();
26    assert_eq!(parser.loc().to_string(), "<string>:5:13");
27    assert!(parser.is_at_eof());
28}
Source

pub fn consume_ws_only(&mut self) -> bool

Consume all whitespace starting at the current position. The definition of whitespace used here is the same as the Unicode standard.

At the time of writing, the following is the definition of whitespace used.

0009..000D    ; White_Space # Cc   [5] <control-0009>..<control-000D>
0020          ; White_Space # Zs       SPACE
0085          ; White_Space # Cc       <control-0085>
00A0          ; White_Space # Zs       NO-BREAK SPACE
1680          ; White_Space # Zs       OGHAM SPACE MARK
2000..200A    ; White_Space # Zs  [11] EN QUAD..HAIR SPACE
2028          ; White_Space # Zl       LINE SEPARATOR
2029          ; White_Space # Zp       PARAGRAPH SEPARATOR
202F          ; White_Space # Zs       NARROW NO-BREAK SPACE
205F          ; White_Space # Zs       MEDIUM MATHEMATICAL SPACE
3000          ; White_Space # Zs       IDEOGRAPHIC SPACE
Source

pub fn take_until(&mut self, token: &str) -> String

Consume characters until an end token is found. The characters consumed are returned without the end token, though the end token is also consumed.

Examples found in repository?
examples/book_primitives_lua.rs (line 15)
1fn main() {
2    // Text to parse.  Note that the comment ends on line 5 at column 12, with
3    // the first non-comment position at column 13.
4    let mut parser = trivet::parse_from_string(
5        r#"
6        --[[
7            I am a long form
8            Lua comment.
9        --]]"#,
10    );
11    parser.borrow_comment_parser().enable_c = false;
12    parser.borrow_comment_parser().enable_cpp = false;
13    parser.borrow_comment_parser().custom = Box::new(|parser: &mut trivet::ParserCore| -> bool {
14        if parser.peek_and_consume_chars(&['-', '-', '[', '[']) {
15            parser.take_until("--]]");
16            true
17        } else if parser.peek_and_consume_chars(&['-', '-']) {
18            parser.take_while(|ch| ch != '\n');
19            true
20        } else {
21            false
22        }
23    });
24    parser.borrow_comment_parser().enable_custom = true;
25    parser.consume_ws();
26    assert_eq!(parser.loc().to_string(), "<string>:5:13");
27    assert!(parser.is_at_eof());
28}
Source

pub fn take_while<T: Fn(char) -> bool>(&mut self, include: T) -> String

Consume characters so long as the test is true. Return the characters consumed, if any.

Examples found in repository?
examples/book_primitives_lua.rs (line 18)
1fn main() {
2    // Text to parse.  Note that the comment ends on line 5 at column 12, with
3    // the first non-comment position at column 13.
4    let mut parser = trivet::parse_from_string(
5        r#"
6        --[[
7            I am a long form
8            Lua comment.
9        --]]"#,
10    );
11    parser.borrow_comment_parser().enable_c = false;
12    parser.borrow_comment_parser().enable_cpp = false;
13    parser.borrow_comment_parser().custom = Box::new(|parser: &mut trivet::ParserCore| -> bool {
14        if parser.peek_and_consume_chars(&['-', '-', '[', '[']) {
15            parser.take_until("--]]");
16            true
17        } else if parser.peek_and_consume_chars(&['-', '-']) {
18            parser.take_while(|ch| ch != '\n');
19            true
20        } else {
21            false
22        }
23    });
24    parser.borrow_comment_parser().enable_custom = true;
25    parser.consume_ws();
26    assert_eq!(parser.loc().to_string(), "<string>:5:13");
27    assert!(parser.is_at_eof());
28}
Source

pub fn take_while_unless<T: Fn(char) -> bool, U: Fn(char) -> bool>( &mut self, include: T, exclude: U, ) -> String

Consume characters so long as either test is true. Return only those characters that satisfy the first test. The exclude predicate is checked first.

Source

pub fn take<S, K>(&mut self, skip: S, stop: K) -> (Vec<char>, Option<char>)
where S: Fn(char) -> bool, K: Fn(char) -> bool,

Consume and return characters. This works as follows.

If the current character satisfies skip, then the character is skipped.

If the current character satisfies stop, then the parse is stopped and the result is returned, regardless of whether any other predicates match.

Other characters (those that do not match skip or stop) are collected and returned.

Note that skip is checked first, then stop. This means the following code works as expected.

use trivet::parse_from_string;
let mut parser = parse_from_string("12_232.14");
assert_eq!(parser.take(
  |ch| ch == '_',
  |ch| ch != '.' && !ch.is_alphanumeric()
), ("12232.14".chars().collect(), None));

Also note that the following code will ignore the stop setting since it is never reached during checking.

use trivet::parse_from_string;
let mut parser = parse_from_string("12_232.14");
assert_eq!(parser.take(
    |ch| ch == '_',
    |ch| ch == '_'
), ("12232.14".chars().collect(), None));

The returned pair contains all matched characters and the character that caused the stop, or None if parsing stopped because the end of stream was reached. Note that the character that caused the stop is not consumed.

Source

pub fn consume_while<T: Fn(char) -> bool>(&mut self, include: T) -> bool

Consume characters so long as the test is true. Returns true if any characters are consumed.

Source

pub fn consume_until(&mut self, token: &str) -> bool

Consume characters until the given end token is found. Returns true if any characters are consumed. The end token is also consumed. This stops at the first occurrence of the end token; that is, it is not greedy.

Auto Trait Implementations§

Blanket Implementations§

Source§

impl<T> Any for T
where T: 'static + ?Sized,

Source§

fn type_id(&self) -> TypeId

Gets the TypeId of self. Read more
Source§

impl<T> Borrow<T> for T
where T: ?Sized,

Source§

fn borrow(&self) -> &T

Immutably borrows from an owned value. Read more
Source§

impl<T> BorrowMut<T> for T
where T: ?Sized,

Source§

fn borrow_mut(&mut self) -> &mut T

Mutably borrows from an owned value. Read more
Source§

impl<T> From<T> for T

Source§

fn from(t: T) -> T

Returns the argument unchanged.

Source§

impl<T, U> Into<U> for T
where U: From<T>,

Source§

fn into(self) -> U

Calls U::from(self).

That is, this conversion is whatever the implementation of From<T> for U chooses to do.

Source§

impl<T, U> TryFrom<U> for T
where U: Into<T>,

Source§

type Error = Infallible

The type returned in the event of a conversion error.
Source§

fn try_from(value: U) -> Result<T, <T as TryFrom<U>>::Error>

Performs the conversion.
Source§

impl<T, U> TryInto<U> for T
where U: TryFrom<T>,

Source§

type Error = <U as TryFrom<T>>::Error

The type returned in the event of a conversion error.
Source§

fn try_into(self) -> Result<U, <U as TryFrom<T>>::Error>

Performs the conversion.