1
  2
  3
  4
  5
  6
  7
  8
  9
 10
 11
 12
 13
 14
 15
 16
 17
 18
 19
 20
 21
 22
 23
 24
 25
 26
 27
 28
 29
 30
 31
 32
 33
 34
 35
 36
 37
 38
 39
 40
 41
 42
 43
 44
 45
 46
 47
 48
 49
 50
 51
 52
 53
 54
 55
 56
 57
 58
 59
 60
 61
 62
 63
 64
 65
 66
 67
 68
 69
 70
 71
 72
 73
 74
 75
 76
 77
 78
 79
 80
 81
 82
 83
 84
 85
 86
 87
 88
 89
 90
 91
 92
 93
 94
 95
 96
 97
 98
 99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
/*
Copyright ⓒ 2016 Daniel Keep.

Licensed under the MIT license (see LICENSE or <http://opensource.org
/licenses/MIT>) or the Apache License, Version 2.0 (see LICENSE of
<http://www.apache.org/licenses/LICENSE-2.0>), at your option. All
files in the project carrying such notice may not be copied, modified,
or distributed except according to those terms.
*/
/*!

This crate provides some macros for quickly parsing values out of text.  Roughly speaking, it does the inverse of the `print!`/`format!` macros; or, in other words, a similar job to `scanf` from C.

The macros of interest are:

* [`readln!`](macro.readln!.html) - reads and scans a line from standard input.
* [`try_readln!`](macro.try_readln!.html) - like `readln!`, except it returns a `Result` instead of panicking.
* [`scan!`](macro.scan!.html) - scans the provided string.

Plus two convenience macros:

* [`let_scan!`](macro.let_scan!.html) - scans a string and binds captured values directly to local variables.  Only supports *one* pattern and panics if it doesn't match.
* [`let_readln!`](macro.let_readln!.html) - reads and scans a line from standard input, binding captured values directly to local variables.  Only supports *one* pattern and panics if it doesn't match.

If you are interested in implementing support for your own types, see the [`ScanFromStr`](scanner/trait.ScanFromStr.html) and [`ScanStr`](scanner/trait.ScanStr.html) traits.

The provided scanners can be found in the [`scanner`](scanner/index.html) module.

<style type="text/css">
.link-block { font-family: "Fira Sans"; }
.link-block > p { display: inline-block; }
.link-block > p > strong { font-weight: 500; margin-right: 1em; }
.link-block > ul { display: inline-block; padding: 0; list-style: none; }
.link-block > ul > li {
  font-size: 0.8em;
  background-color: #eee;
  border: 1px solid #ccc;
  padding: 0.3em;
  display: inline-block;
}
</style>
<span></span><div class="link-block">

**Links**

* [Latest Release](https://crates.io/crates/scan-rules/)
* [Latest Docs](https://danielkeep.github.io/rust-scan-rules/doc/scan_rules/index.html)
* [Repository](https://github.com/DanielKeep/rust-scan-rules)

<span></span></div>

## Compatibility

`scan-rules` is compatible with `rustc` version 1.6.0 and higher.

* Due to a breaking change, `scan-rules` is not compatible with `regex` version 0.1.66 or higher.

* `rustc` < 1.10 will not have the `let_readln!` macro.

* `rustc` < 1.7 will have only concrete implementations of `ScanFromStr` for the `Everything`, `Ident`, `Line`, `NonSpace`, `Number`, `Word`, and `Wordish` scanners for `&str` and `String` output types.  1.7 and higher will have generic implementations for all output types such that `&str: Into<Output>`.

* `rustc` < 1.6 is explicitly not supported, due to breaking changes in Rust itself.

## Features

The following [optional features](http://doc.crates.io/manifest.html#the-features-section) are available:

* `arrays-32`: implement scanning for arrays of up to 32 elements.  The default is up to 8 elements.

* `duration-iso8601-dates`: support scanning ISO 8601 durations with date components.

* `regex`: include support for the `re`, `re_a`, and `re_str` regular expression-based runtime scanners.  Adds a dependency on the `regex` crate.

* `tuples-16`: implement scanning for tuples of up to 16 elements.  The default is up to 4 elements.

* `unicode-normalization`: include support for `Normalized` and `IgnoreCaseNormalized` cursor types.  Adds a dependency on the `unicode-normalization` crate.

The following are only supported on nightly compilers, and may disappear/change at any time:

* `nightly-pattern`: adds the `until_pat`, `until_pat_a`, and `until_pat_str` runtime scanners using `Pattern`s.

## Important Notes

* There are no default scanners for `&str` or `String`; if you want a string, you should pick an appropriate abstract scanner from the [`scanner`](scanner/index.html) module.

* The macros in this crate are extremely complex.  Moderately complex usage can exhaust the standard macro recursion limit.  If this happens, you can raise the limit (from its default of 64) by adding the following attribute to your crate's root module:

  `#![recursion_limit="128"]`

## Quick Examples

Here is a simple CLI program that asks the user their name and age.  You can run this using `cargo run --example ask_age`.

```ignore
#[macro_use] extern crate scan_rules;

use scan_rules::scanner::Word;

fn main() {
    print!("What's your name? ");
    let name: String = readln! { (let name: Word<String>) => name };
    //                           ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~^ rule
    //                                                       ^~~^ body
    //                           ^~~~~~~~~~~~~~~~~~~~~~~^ pattern
    //                            ^~~~~~~~~~~~~~~~~~~~~^ variable binding

    print!("Hi, {}.  How old are you? ", name);
    readln! {
        (let age) => {
    //   ^~~~~~^ implicitly typed variable binding
            let age: i32 = age;
            println!("{} years old, huh?  Neat.", age);
        },
        (..other) => println!("`{}` doesn't *look* like a number...", other),
    //   ^~~~~~^ bind to any input "left over"
    }

    print!("Ok.  What... is your favourite colour? (R, G, B): ");
    let_readln!(let r: f32, ",", let g: f32, ",", let b: f32);
    //          ^~~~^            ^~~~^            ^~~~^
    // Scans and binds three variables without nesting scope.
    // Panics if *anything* goes wrong.
    if !(g < r && g < b && b >= r * 0.25 && b <= r * 0.75) {
        println!("Purple's better.");
    } else {
        println!("Good choice!");
    }
}
```

This example shows how to parse one of several different syntaxes.  You can run this using `cargo run --example scan_data`.

```ignore
#[macro_use] extern crate scan_rules;

use std::collections::BTreeSet;

// `Word` is an "abstract" scanner; rather than scanning itself, it scans some
// *other* type using custom rules.  In this case, it scans a word into a
// string slice.  You can use `Word<String>` to get an owned string.
use scan_rules::scanner::Word;

#[derive(Debug)]
enum Data {
    Vector(i32, i32, i32),
    Truthy(bool),
    Words(Vec<String>),
    Lucky(BTreeSet<i32>),
    Other(String),
}

fn main() {
    print!("Enter some data: ");
    let data = readln! {
        ("<", let x, ",", let y, ",", let z, ">") => Data::Vector(x, y, z),
    //      ^ pattern terms are comma-separated
    //   ^~^ literal text match

        // Rules are tried top-to-bottom, stopping as soon as one matches.
        (let b) => Data::Truthy(b),
        ("yes") => Data::Truthy(true),
        ("no") => Data::Truthy(false),

        ("words:", [ let words: Word<String> ],+) => Data::Words(words),
    //             ^~~~~~~~~~~~~~~~~~~~~~~~~~~~^ repetition pattern
    //                                         ^ one or more matches
    //                                        ^ matches must be comma-separated

        ("lucky numbers:", [ let ns: i32 ]*: BTreeSet<_>) => Data::Lucky(ns),
    //          collect into specific type ^~~~~~~~~~~~^
    //                                    ^ zero or more (you might be unlucky!)
    //                                      (no separator this time)

        // Rather than scanning a sequence of values and collecting them into
        // a `BTreeSet`, we can instead scan the `BTreeSet` *directly*.  This
        // scans the syntax `BTreeSet` uses when printed using `{:?}`:
        // `{1, 5, 13, ...}`.
        ("lucky numbers:", let ns) => Data::Lucky(ns),

        (..other) => Data::Other(String::from(other))
    };
    println!("data: {:?}", data);
}
```

This example demonstrates using runtime scanners and the `let_scan!` convenience macro.  You can run this using `cargo run --example runtime_scanners`.

```ignore
//! **NOTE**: requires the `regex` feature.
#[macro_use] extern crate scan_rules;

fn main() {
    use scan_rules::scanner::{
        NonSpace, Number, Word,             // static scanners
        max_width_a, exact_width_a, re_str, // runtime scanners
    };

    // Adapted example from <http://en.cppreference.com/w/cpp/io/c/fscanf>.
    let inp = "25 54.32E-1 Thompson 56789 0123 56ß水";

    // `let_scan!` avoids the need for indentation and braces, but only supports
    // a single pattern, and panics if anything goes wrong.
    let_scan!(inp; (
        let i: i32, let x: f32, let str1 <| max_width_a::<NonSpace>(9),
    //               use runtime scanner ^~~~~~~~~~~~~~~~~~~~~~~~~~~~^
    //          limit maximum width of a... ^~~~~~~~~~^
    //                      ...static NonSpace scanner... ^~~~~~~^
    //                                                      9 bytes ^
        let j <| exact_width_a::<i32>(2), let y: f32, let _: Number,
    //        ^~~~~~~~~~~~~~~~~~~~~~~~~^ scan an i32 with exactly 2 digits
        let str2 <| re_str(r"^[0-9]{1,3}"), let warr: Word
    //           ^~~~~~~~~~~~~~~~~~~~~~~~^ scan using a regular expression
    ));

    println!(
        "Converted fields:\n\
            i = {i:?}\n\
            x = {x:?}\n\
            str1 = {str1:?}\n\
            j = {j:?}\n\
            y = {y:?}\n\
            str2 = {str2:?}\n\
            warr = {warr:?}",
        i=i, j=j, x=x, y=y,
        str1=str1, str2=str2, warr=warr);
}
```

## Rule Syntax

Scanning rules are written as one or more arms like so:

```ignore
scan! { input_expression;
    ( pattern ) => body,
    ( pattern ) => body,
    ...
    ( pattern ) => body,
}
```

Note that the trailing comma on the last rule is optional.

Rules are checked top-to-bottom, stopping at the first that matches.

Patterns (explained under ["Pattern Syntax"](#pattern-syntax)) must be enclosed in parentheses.  If a pattern matches the provided input, the corresponding body is evaluated.

### Pattern Syntax

A scanning pattern is made up of one or more pattern terms, separated by commas.  The following terms are supported:

* *strings* - any expression that evaluates to a string will be used as a literal match on the input.  Exactly *how* this match is done depends on the kind of input, but the default is to do a case-sensitive match of whole words, individual non-letter characters, and to ignore all whitespace.

  *E.g.* `"Two words"`, `"..."` (counts as three "words"), `&format!("{} {}", "Two", "words")`.

* `let` *name* \[ `:` *type* ] - scans a value out of the input text, and binds it to *name*.  If *type* is omitted, it will be inferred.

  *E.g.* `let x`, `let n: i32`, `let words: Vec<_>`, `let _: &str` (scans and discards a value).

* `let` *name* `<|` *expression* - scans a value out of the input text and binds it to *name*, using the value of *expression* to perform the scan.  The expression must evaluate to something that implements the `ScanStr` trait.

  *E.g.* `let n <| scan_a::<i32>()` (same as above example for `n`), `let three_digits <| max_width_a::<u32>()` (scan a three-digit `u32`).

* `..` *name* - binds the remaining, unscanned input as a string to *name*.  This can *only* appear as the final term in a top-level pattern.

* `[` *pattern* `]` \[ *(nothing)* | `,` | `(` *seperator pattern* `)` ] ( `?` | `*` | `+` | `{` *range* `}` ) \[ ":" *collection type* ] - scans *pattern* repeatedly.

  The first (mandatory) part of the term specifies the *pattern* that should be repeatedly scanned.

  The second (optional) part of the term controls if (and what) repeats are separated by.  `,` is provided as a short-cut to an obvious common case; it is equivalent to writing `(",")`.  Otherwise, you may write any arbitrary *separator pattern* as the separator, including variable bindings and more repetitions.

  The third (mandatory) part of the term specifies how many times *pattern* should be scanned.  The available options are:

  * `?` - match zero or one times.
  * `*` - match zero or more times.
  * `+` - match one or more times.
  * `{n}` - match exactly *n* times.
  * `{a,}` - match at least *a* times.
  * `{,b}` - match at most *b* times.
  * `{a, b}` - match at least *a* times, and at most *b* times.

  The fourth (optional) part of the term specifies what type of collection scanned values should be added to.  Note that the type specified here applies to *all* values captured by this repetition.  As such, you typically want to use a partially inferred type such as `BTreeSet<_>`.  If omitted, it defaults to `Vec<_>`.

  *E.g.* `[ let nums: i32 ],+`, `[ "pretty" ]*, "please"`.

*/
#![cfg_attr(feature="nightly-pattern", feature(pattern))]
#![forbid(missing_docs)]
#![recursion_limit="128"]
#[macro_use] extern crate lazy_static;
extern crate itertools;
extern crate strcursor;
#[cfg(feature="regex")] extern crate regex;
#[cfg(feature="unicode-normalization")] extern crate unicode_normalization;

#[macro_use] mod macros;

pub use error::{ScanError, ScanErrorAt, ScanErrorKind};

mod error;
pub mod input;
pub mod internal;
pub mod scanner;
mod unicode;
mod util;