rusty_parser 0.1.4

A Generic Parser generator and pattern matching library written in Rust
Documentation
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
# RustyParser
A Generic Parser generator and Pattern Matching Library written in Rust

## Example
`rusty_parser/src/example/example1.rs`

```rust
// import rusty_parser
use rusty_parser as rp;

// for assert_eq!()
use std::any::type_name;
use std::any::type_name_of_val;

// trait Parser; must be imported for .parse( ... ) method
use rp::Parser;

#[test]
fn example1() {
    // target string to parse
    let target_string = "123456789";

    // define pattern: [0-9]
    let digit_parser = rp::range('0'..='9');

    // parse; put IntoIterator
    let res = digit_parser.parse(target_string.chars());

    // Output = (Charactor Type You Entered,)  -->  (char,)
    // All Parser's Output must be Tuple
    // res.output: Option< Output of the Parser >
    assert_eq!(
        type_name_of_val(&res.output),
        type_name::<Option<(char,)>>()
    );
    assert_eq!(res.output, Some(('1',)));

    // res.it: iterator after parsing
    // here, '1' is parsed, so the rest is "23456789"
    assert_eq!(res.it.collect::<String>(), target_string[1..]);

    // define pattern: 'a'
    let a_parser = rp::one('a');
    // this will fail
    let res = a_parser.parse(target_string.chars());
    assert_eq!(res.output, None);

    // iterator will not move if parsing failed
    assert_eq!(res.it.collect::<String>(), target_string);

    // define pattern: [0-9][0-9]
    // perform 'digit_parser', and then 'digit_parser', sequentially
    // Output = ( Output of first Parser, Output of second Parser, )  -->  (char, char,)
    let two_digit_parser = digit_parser.clone().seq(digit_parser);
    //                          ^ move occured here, so clone() it.

    // parse; put IntoIterator
    let res = two_digit_parser.parse(target_string.chars());
    assert_eq!(
        type_name_of_val(&res.output),
        type_name::<Option<(char, char,)>>()
    );
    assert_eq!(res.output, Some(('1', '2')));

    // Output mapping
    // ( char, char, )  -->  (i32, )
    // Parser's Output must be Tuple
    let int_parser = two_digit_parser.map(|(x, y)| -> (i32,) {
        let x_i32 = x as i32 - '0' as i32;
        let y_i32 = y as i32 - '0' as i32;
        (x_i32 * 10 + y_i32,)
    });

    let res = int_parser.parse(target_string.chars());
    assert_eq!(res.output, Some((12,)));

    // pattern matching
    // .match_pattern only checks if the pattern is matched or not
    // it does not try to extract data from input string (e.g. push element in Vec above)
    // Output = always ()
    let res = int_parser.match_pattern(target_string.chars());
    assert_eq!(res.output, Some(()));
}
```

## Structures
Every Parser implements `trait Parser<It>`.
`It` is the type of the iterator that the Parser will work on.

`trait Parser` has associate type `Output` which is the type of the output, the extracted data from the input string.

`trait Parser` has following methods.

 ```rust 
 fn parse(&self, it: It) -> ParseResult<Self::Output, It>;
 fn match_pattern(&self, it: It) -> ParseResult<(), It>;
 ```
 
 which takes an iterator and returns `ParseResult<Self::Output, It>`.
 `match_pattern(...)` is used 
 when you only want to check if the pattern is matched or not, without extracting data. 
 For some parsers, like `repeat`, it is expensive to call `parse(...)` to get the output since it invokes `Vec::insert` inside.


`ParseResult` is a struct representing the result of parsing.

``` rust
pub struct ParseResult<Output, It>
where
    Output: Tuple,
    It: Iterator + Clone,
{
    // the output; extracted data
    // 'None' means parsing failed
    pub output: Option<Output>,

    // iterator after parsing
    // if parsing failed, this will be the same as the input iterator
    pub it: It,
}
```

Note that `Output` must be a Tuple 
(including null-tuple `()`). 
Even if the Parser extracts only one element, the output must be a Tuple.

Since the `parse(...)` internally clones the iterator, 
the iterator must be cheaply clonable.


## Basic Parsers

 ### `one`: consumes one charactor if it is equal to `c`.
 ```rust
 let parser = one( c: CharactorType )
 let a_parser = one('a');
 ```
`Output`: `(Iterator::Item,)`

### `range`: consumes one charactor if it is in the range `r`.
```rust
let parser = range( r: impl std::ops::RangeBounds )
let digit_parser = range( '0'..='9' )
```
`Output`: `(Iterator::Item,)`

### `string`: consumes multiple charactors if it is equal to `s`.
```rust
let parser = string( s: impl IntoIterator )
let hello_parser = string("hello".chars()); // &str is not IntoIterator
```
`Output`: `()`

### Dictionary: build Trie from a list of strings
```rust
// let mut parser = rp::DictBTree::new();
let mut parser = rp::DictHashMap::new();

parser.insert("hello".chars(), (1,));
parser.insert("hello_world".chars(), (2,));
parser.insert("world".chars(), (3,));

// this will match as long as possible
let res = parser.parse("hello_world_abcdefg".chars());
assert_eq!(res.output, Some((2,)));
// 'hello_world' is parsed, so the rest is "_abcdefg"
assert_eq!(res.it.collect::<String>(), "_abcdefg");

// match 'hello' only
let res = parser.parse("hello_wo".chars());
assert_eq!(res.output, Some((1,)));
```
`Output`: generic type you support

There are two types of Dictionary: `DictBTree` and `DictHashMap`, for Trie implementation.
Both of them have their own Pros and Cons (the memory usage and time complexity of searching), so you can choose one of them.

### `End`: success if it reached to the end of input
```rust
let end_parser = rp::End::new();
let res = end_parser.parse("".chars());
assert_eq!( res.output, Some(()));
```

`Output`: `()`

### Combinators

### `seq`: sequence of parsers
```rust
let a_parser = rp::one('a');
let b_parser = rp::one('b');

// parser sequence
// 'a', and then 'b'
let ab_parser = a_parser.seq(b_parser);

let res = ab_parser.parse("abcd".chars());
assert_eq!(res.output, Some(('a', 'b')));
assert_eq!(res.it.collect::<String>(), "cd");
  ```
  `Output`: `( L0, L1, ..., R0, R1, ... )` 
  where `(L0, L1, ...)` are the outputs of the first parser, 
  and `(R0, R1, ...)` are the outputs of the second parser.

### `or_`: or combinator

```rust
let a_parser = rp::one('a');
let b_parser = rp::one('b');

// parser sequence
// if 'a' is not matched, then try 'b'
// the order is preserved; if both parser shares condition
let ab_parser = a_parser.or_(b_parser);

// 'a' is matched
let res = ab_parser.parse("abcd".chars());
assert_eq!(res.output, Some(('a',)));
assert_eq!(res.it.clone().collect::<String>(), "bcd");

// continue parsing from the rest
// 'a' is not matched, but 'b' is matched
let res = ab_parser.parse(res.it);
assert_eq!(res.output, Some(('b',)));
assert_eq!(res.it.clone().collect::<String>(), "cd");

// continue parsing from the rest
// 'a' is not matched, 'b' is not matched; failed
let res = ab_parser.parse(res.it);
assert_eq!(res.output, None);
assert_eq!(res.it.clone().collect::<String>(), "cd");
```
`Output`: `Output` of the first and second parser.
Note that the output of both parsers must be the same type.


### `map`: map the output of the parser
```rust
let a_parser = rp::one('a');

// map the output
// (Charactor Type You Entered,)  -->  (i32, )
let int_parser = a_parser.map(|(ch,)| -> (i32,) { (ch as i32 - 'a' as i32,) });

let res = int_parser.parse("abcd".chars());
assert_eq!(res.output, Some((0,)));
assert_eq!(res.it.collect::<String>(), "bcd");
```
`Output`: return type of the closure ( must be Tuple )

### `repeat`: repeat the parser multiple times

```rust
let a_parser = rp::one('a');

// repeat 'a' 3 to 5 times (inclusive)
let multiple_a_parser = a_parser.repeat(3..=5);

let res = multiple_a_parser.parse("aaaabcd".chars());
// four 'a' is parsed
assert_eq!(res.output, Some((vec!['a', 'a', 'a', 'a',],)));
assert_eq!(res.it.collect::<String>(), "bcd");
```

`Output`: 
 - if `Output` of the repeated parser is `()`, then `Output` is `()`
 - if `Output` of the repeated parser is `(T,)`, then `Output` is `Vec<T>`
 - otherwise, `Vec< Output of the Repeated Parser >`

### `void_`: ignore the output of the parser
Force the output to be `()`. 
It internally calls `match_pattern(...)` instead of `parse(...)`. 
This is useful when you only want to check if the pattern is matched or not. 
For more information, see `match_pattern(...)` above.

```rust
let a_parser = rp::one('a');
let a_parser = a_parser.map(|(_,)| -> (i32,) {
    // some expensive operations....
    panic!("This should not be called");
});
let multiple_a_parser = a_parser.repeat(3..=5);
let multiple_a_void_parser = multiple_a_parser.void_();

// ignore the output of parser
// this internally calls 'match_pattern(...)' instead of 'parse(...)'
let res = multiple_a_void_parser.parse("aaaabcd".chars());
assert_eq!(res.output, Some(()));
assert_eq!(res.it.collect::<String>(), "bcd");
```
`Output`: `()`


### `iter`: capture a [begin, end) iterator range on input string
```rust
let hello_parser = rp::string("hello".chars());
let digit_parser = rp::range('0'..='9').void_();
let parser = hello_parser.seq(
  digit_parser.repeat(3..=3)
).iter();

//                   <------> parsed range
let target_string = "hello0123";
//                   |       ^ end
//                   ^ begin
let res = parser.parse(target_string.chars());
assert_eq!(res.output.is_some(), true);
let (begin, end) = res.output.unwrap();
assert_eq!(begin.collect::<String>(), "hello0123");
assert_eq!(end.collect::<String>(), "3");
```
`Output`: `(It, It)`


## For complex, highly recursive pattern

By default, all the 'parser-generating' member functions consumes `self` and returns a new Parser. 
And `Parser::parse(&self)` takes immutable reference of Self.

However, in some cases, you may want to define a recursive parser.
Which involves 'reference-of-parser' or 'virtual-class-like' structure.

Luckily, Rust std provides wrapper for these cases.
`Rc`, `RefCell`, `Box` are the most common ones.

RustyParser provides `BoxedParser`, `RCedParser`, `RefCelledParser` which are Parser Wrapper for `Box`, `Rc`, `RefCell`.

### `boxed`: a `Box<dyn Parser>` wrapper

```rust
let hello_parser = rp::string("hello".chars());
let digit_parser = rp::range('0'..='9').void_(); // force the output to be ()

// this will wrap the parser into Box< dyn Parser >
let mut boxed_parser = hello_parser.boxed();
// Note. boxed_parser is mutable

let target_string = "hello0123";

let res_hello = boxed_parser.parse(target_string.chars());
// success
assert_eq!(res_hello.output, Some(()));
assert_eq!(res_hello.it.clone().collect::<String>(), "0123");

// now change boxed_parser to digit_parser
boxed_parser = digit_parser.boxed();
// this is same as:
// boxed_parser.assign(digit_parser);

let res_digit = boxed_parser.parse(res_hello.it);
// success
assert_eq!(res_digit.output, Some(()));
assert_eq!(res_digit.it.collect::<String>(), "123");
```
`Output`: the `Output` of child parser

### `refcelled`: a `RefCell<Parser>` wrapper
`RefCelledParser` is useful if it is combined with `BoxedParser` or `RCedParser`.
Since it provides internal mutability.

```rust
let hello_parser = rp::string("hello".chars());
let digit_parser = rp::range('0'..='9').void_();

// this will wrap the parser into Box< dyn Parser >
let boxed_parser = hello_parser.boxed();
let refcelled_parser = boxed_parser.refcelled();
// Note. refcelled_parser is immutable

let target_string = "hello0123";

let res_hello = refcelled_parser.parse(target_string.chars());
// success
assert_eq!(res_hello.output, Some(()));
assert_eq!(res_hello.it.clone().collect::<String>(), "0123");

// now change refcelled_parser to digit_parser
refcelled_parser           // RefCelledParser
    .refcelled_parser()    // &RefCell<BoxedParser>
    .borrow_mut()          // RefMut<BoxedParser> --> &mut BoxedParser
    .assign(digit_parser); // assign new parser

let res_digit = refcelled_parser.parse(res_hello.it);
// success
assert_eq!(res_digit.output, Some(()));
assert_eq!(res_digit.it.collect::<String>(), "123");
```
`Output`: the `Output` of child parser

### `rced`: a `Rc<Parser>` wrapper
`RCedParser` is used to share the same parser.

```rust
let hello_parser = rp::string("hello".chars());
let digit_parser = rp::range('0'..='9').void_();

// this will wrap the parser into Box< dyn Parser >
let boxed_parser = hello_parser.boxed();
let refcelled_parser = boxed_parser.refcelled();
// Note. refcelled_parser is immutable

let rced_parser1 = refcelled_parser.rced();
let rced_parser2 = rp::RCed::clone(&rced_parser1);
// rced_parser2 is now pointing to the same parser as rced_parser1

let target_string = "hello0123";

let res_hello = rced_parser1.parse(target_string.chars());
// success
assert_eq!(res_hello.output, Some(()));
assert_eq!(res_hello.it.clone().collect::<String>(), "0123");

// now change rced_parser1 to digit_parser
rced_parser1               // RCedParser
    .rced_parser()         // &Rc<RefCelledParser>
    .refcelled_parser()    // &RefCell<BoxedParser>
    .borrow_mut()          // RefMut<BoxedParser> --> &mut BoxedParser
    .assign(digit_parser); // assign new parser

// rced_parser2 should also be digit_parser
let res_digit = rced_parser2.parse(res_hello.it);
// success
assert_eq!(res_digit.output, Some(()));
assert_eq!(res_digit.it.collect::<String>(), "123");
```
`Output`: the `Output` of child parser


## Making your own Parser
You can design your own Parser by
```rust
parser( closure: impl Fn(&mut It) -> Option<NewOutput> ) -> impl Parser<It>
```

the closure takes mutable reference of the iterator and returns `Option<NewOutput>`.

```rust
let parser = rp::parser(|it: &mut std::str::Chars| {
    if it.take(5).eq("hello".chars()) {
        Some((0,))
    } else {
      // no need to move the iterator back
        None
    }
});

let target_string = "hello0123";
let res = parser.parse(target_string.chars());
assert_eq!(res.output, Some((0,)));
assert_eq!(res.it.collect::<String>(), "0123");
```