## Installation
```
[dependencies]
parsable = "0.1"
```
## Example
Implementation of a basic operation interpreter that only works with positive integer and without operator priorities.
```rust
use parsable::{parsable, Parsable, ParseOptions};
#[parsable]
enum Operator {
Plus = "+",
Minus = "-",
Mult = "*",
Div = "/",
Mod = "%"
}
#[parsable]
struct NumberLiteral {
#[parsable(regex=r"\d+")]
value: String
}
impl NumberLiteral {
fn process(&self) -> i32 {
self.value.parse().unwrap()
}
}
#[parsable]
enum Operand {
Number(NumberLiteral),
Wrapped(WrappedOperation)
}
impl Operand {
fn process(&self) -> i32 {
match self {
Operand::Number(number) => number.process(),
Operand::Wrapped(wrapped) => wrapped.process(),
}
}
}
#[parsable]
struct Operation {
first_operand: Operand,
other_operands: Vec<(Operator, Operand)>
}
impl Operation {
fn process(&self) -> i32 {
let mut result = self.first_operand.process();
for (operator, operand) in &self.other_operands {
let value = operand.process();
result = match operator {
Operator::Plus => result + value,
Operator::Minus => result - value,
Operator::Mult => result * value,
Operator::Div => result / value,
Operator::Mod => result % value,
}
}
result
}
}
#[parsable]
struct WrappedOperation {
#[parsable(brackets="()")]
operation: Box<Operation>
}
impl WrappedOperation {
fn process(&self) -> i32 {
self.operation.process()
}
}
fn main() {
let operation_string = "3 + (4 * 5)".to_string();
let parse_options = ParseOptions::default();
match Operation::parse(operation_string, parse_options) {
Ok(operation) => {
println!("result: {}", operation.process());
},
Err(error) => {
dbg!(error);
}
}
}
```
## The `#[parsable]` macro
Tagging a struct or enum with the `#[parsable]` macro implements the `Parsable` trait for the item, with the condition that all fields must also implement the `Parsable` trait.
It can also be applied on a field to tweak the way it is parsed.
### Struct
- All fields are parsed one after the other. The parsing is only successful if all fields are succesfully parsed.
### Enum
- The parsing stops on the first variant that is successfully parsed.
- If a variant contains multiple fields, they are parsed successively and must all be successful for the variant to be matched.
- If a variant contains no field, a string must be specified to indicate how to parse it.
```rust
#[parsable]
enum MyOperation {
BinaryOperation(NumerLiteral, Operator, NumerLiteral),
Number(NumberLiteral),
Zero = "zero"
}
// If the first two variants are swapped, the parsing will never reach the `BinaryOperation` variant.
```
## Builtin types
### `String`
A string field must be tagged with the `#[parsable(regex="<pattern>")]` or `#[parsable(value="<string>")]` macro option to specify how to parse it.
```rust
// Matches at least one digit
#[parsable]
struct NumberLiteral {
#[parsable(regex=r"\d+")]
value: String
}
```
```rust
#[parsable]
// Only matches the string "+"
struct PlusSign {
#[parsable(value="+")]
value: String
}
```
### `Option<T>`
Matches `T`. If it fails, returns `None` but the parsing of the field is still considered successful.
```rust
#[parsable]
enum Sign {
Plus = "+",
Minus = "-"
}
// Matches a number with an optional sign.
#[parsable]
struct NumberLiteral {
sign: Option<Sign>,
#[parsable(regex=r"\d+")]
value: String
}
```
### `Vec<T>`
Matches as many `T` as possible successively. The following options can be specified:
- `min=X`: the parsing is only valid if at least X items are parsed
- `separator=<string>`: after each item, the parser will attempt to consume the separator. The parsing fails if no separator is found.
```rust
// Matches a non-empty list of numbers separated by a comma
#[parsable]
struct NumberList {
#[parsable(separator=",", min=1)]
numbers: Vec<NumberLiteral>
}
```
### Other types
- `()`: matches nothing, is always successful.
- `(T, U)`: matches `T`, then `U`.
- `Box<T>`: matches `T`.
## Running the parser
The `Parsable` trait provides the `parse()` method that takes two arguments:
- `content: String`: the string to parse
- `options: ParseOptions`: parse options
The `ParseOptions` type has the following fields:
- `comment_start: Option<&'static str>`: when the specified pattern is matched, the rest of the line is ignored. Common instances are `"//"` or `"#"`.
- `file_path: Option<String>`: file path of the string being parsed.
- `package_root_path: Option<String>`: root path of package or module containing the file being parsed.
The `file_path` and `package_root_path` fields are forwarded to the `FileInfo` struct and are never actually used by the library.
Blank characters (spaces, new lines and tabulations) are always ignored during parsing.
## FileInfo
The `FileInfo` structure is used accross the library. It has the following fields:
- `content: String`: the string being parsed
- `path: String`: the path of the file being parsed, as specified in `ParseOptions`
- `package_root_path: String`: the path of the package containing the file, as specified in `ParseOptions`
It also provides the following methods:
- `get_line_col(index: usize) -> Option<(usize, usize)>`: returns the line and column numbers (starting at 1) associated with the specified character index. This method assumes 1 character per byte and therefore does not work properly when the file contains non-ascii characters.
## ItemLocation
Tagging a struct with `#[parsable]` adds a `location` field of type `ItemLocation` with the following fields & methods:
- `file: Rc<FileInfo>`: information on the file containing the item
- `start: usize`: starting index of the item in the file
- `end: usize`: ending index of the item in the file
- `get_start_line_col() -> (usize, usize)`: get the line and column numbers (starting at 1) of the location start
The `Parsable` also trait provides a `location()` method:
- on a structure, it returns its `location` field
- on an enum, it returns the `location()` method of the variant that was matched
- calling `location()` on a variant with no field panics
A way to prevent the panic is to wrap enums with unit variants in a structure:
```rust
#[parsable]
enum Operator {
Plus = "+",
Minus = "-",
Mult = "*",
Div = "/",
Mod = "%"
}
#[parsable]
struct WrappedOperator {
operator: Operator
}
fn main() {
let string = "+".to_string();
let options = ParseOptions::default();
let result = WrappedOperator::parse(string, options).unwrap();
dbg!(result.location()); // It works!
}
```
## ParseError
On failure, `Parsable::parse()` returns `Err(ParseError)`. This structure has the following fields:
- `file: Rc<FileInfo>`: the file where the error occured.
- `index: usize`: the index at which the error occured.
- `expected: Vec<String>`: a list of item names that where expected at this index.
## Macro options
### Root attributes
- `located=<bool>`: on a structure, indicates whether or not the `location` field should be generated. Default: `true`.
- `cascade=<bool>`: if `true` on a structure, indicates that if an `Option` field is not matched, then the parser should not attempt to match other `Option` fields. It does not invalidate the overall struct parsing. Default: `false`.
- `name=<string>`: indicates the name of the struct or enum, which is used in when a parsing error occurs. Default: the name of the struct or enum.
```rust
#[parsable(located=false)] // The `location` field will not be added
struct Operation {
first_operand: Operand,
other_operands: Vec<(Operator, Operand)>
}
```
### Field attributes
- `prefix=<string>`: attempt to parse the specified string before parsing the field. If the prefix parsing fails, then the field parsing fails.
- `suffix=<string>`: attempt to parse the specified string after parsing the field. If the suffix parsing fails, then the field parsing fails.
- `brackets=<string>`: shortcut to specify both a prefix and a suffix using the first two characters of the specified string.
- `exclude=<string>`: indicates that the parsing is only valid if the item does not match the specified regex
- `followed_by=<string>`: indicates that the parsing if only valid if the item is followed by the specified regex.
- `not_followed_by=<string>`: indicates that the parsing if only valid if the item is not followed by the specified regex.
- `value=<string>`: on a `String` field, indicates that the field only matches the specified string.
- `regex=<string>`: on a `String` field, indicates that the field only matches the regex with the specified pattern (using the [`regex`](https://docs.rs/regex/latest/regex/) crate).
- `separator=<string>`: on a `Vec` field, specify the separator between items.
- `min=<integer>`: on a `Vec` field, specify the minimum amount of items for the parsing to be valid.
- `cascade=false`: indicates that this field ignore the root `cascade` option
## Manually implementing the `Parsable` trait
Sometimes `#[parsable]` is not enough and you want to implement your own parsing mechanism. This is done by implementing the `parse_item`, `get_item_name` and `location` methods.
```rust
use parsable::{Parsable, StringReader};
struct MyInteger {
value: u32,
location: ItemLocation,
}
impl Parsable for MyInteger {
fn parse_item(reader: &mut StringReader) -> Option<Self> {
let start = reader.get_index();
match reader.read_regex(r"\d+") {
Some(string) => Some(MyInteger {
value: string.parse().unwrap(),
location: reader.get_item_location(start),
}),
None => None,
}
}
// Only used in errors
fn get_item_name() -> String {
"integer".to_string()
}
// Not required, but convenient
fn location(&self) -> &ItemLocation {
&self.location
}
}
fn main() {
let number_string = "56";
let number = MyInteger::parse(number_string.to_string(), ParseOptions::default()).unwrap();
println!("{}", number.value);
}
```
`StringReader` wraps the string being parsed with an index that increases as the parsing goes on. It has the following methods:
- `content() -> &str`: returns the whole string
- `get_index() -> usize`: returns the current index in the string
- `set_index(index: usize) -> usize`: set the current index in the string
- `as_str() -> &str`: returns the part of the string that has not been parsed yet (same as `&self.content()[self.get_index()..]`)
- `as_char() -> char`: returns the current character (same as `&self.content().as_bytes()[self.get_index()]`)
- `is_finished() -> bool`: indicates whether the end of the string has been reached
- `advance(length: usize) -> Option<&str>`: advance the current index by `length` and returns the corresponsing substring. If `length` is `0`, returns `None`
- `eat_spaces()`: advance the current index until a non-blank and non-comment character is reached
- `read_string(string: &str) -> Option<&str>`: if the string starts with `string`, advance the current index by `string`'s length and returns it, otherwise returns `None`
- `read_regex(pattern: &'static str) -> Option<&str>`: if the string starts with the specified regex pattern, advance the current index the parsed string'length and returns it, otherwise returns `None`
- `peek_regex(pattern: &'static str) -> bool`: indicates if the string starts with the specified regex pattern, without advancing the current index
If `parse_item` returns `None`, it must ensure that the index is the same when the function exits as it was when it started.
## License
MIT