scan_rules/
lib.rs

1/*
2Copyright ⓒ 2016 Daniel Keep.
3
4Licensed under the MIT license (see LICENSE or <http://opensource.org
5/licenses/MIT>) or the Apache License, Version 2.0 (see LICENSE of
6<http://www.apache.org/licenses/LICENSE-2.0>), at your option. All
7files in the project carrying such notice may not be copied, modified,
8or distributed except according to those terms.
9*/
10/*!
11
12This crate provides some macros for quickly parsing values out of text.  Roughly speaking, it does the inverse of the `print!`/`format!` macros; or, in other words, a similar job to `scanf` from C.
13
14The macros of interest are:
15
16* [`readln!`](macro.readln!.html) - reads and scans a line from standard input.
17* [`try_readln!`](macro.try_readln!.html) - like `readln!`, except it returns a `Result` instead of panicking.
18* [`scan!`](macro.scan!.html) - scans the provided string.
19
20Plus two convenience macros:
21
22* [`let_scan!`](macro.let_scan!.html) - scans a string and binds captured values directly to local variables.  Only supports *one* pattern and panics if it doesn't match.
23* [`let_readln!`](macro.let_readln!.html) - reads and scans a line from standard input, binding captured values directly to local variables.  Only supports *one* pattern and panics if it doesn't match.
24
25If you are interested in implementing support for your own types, see the [`ScanFromStr`](scanner/trait.ScanFromStr.html) and [`ScanStr`](scanner/trait.ScanStr.html) traits.
26
27The provided scanners can be found in the [`scanner`](scanner/index.html) module.
28
29<style type="text/css">
30.link-block { font-family: "Fira Sans"; }
31.link-block > p { display: inline-block; }
32.link-block > p > strong { font-weight: 500; margin-right: 1em; }
33.link-block > ul { display: inline-block; padding: 0; list-style: none; }
34.link-block > ul > li {
35  font-size: 0.8em;
36  background-color: #eee;
37  border: 1px solid #ccc;
38  padding: 0.3em;
39  display: inline-block;
40}
41</style>
42<span></span><div class="link-block">
43
44**Links**
45
46* [Latest Release](https://crates.io/crates/scan-rules/)
47* [Latest Docs](https://danielkeep.github.io/rust-scan-rules/doc/scan_rules/index.html)
48* [Repository](https://github.com/DanielKeep/rust-scan-rules)
49
50<span></span></div>
51
52## Compatibility
53
54`scan-rules` is compatible with `rustc` version 1.6.0 and higher.
55
56* Due to a breaking change, `scan-rules` is not compatible with `regex` version 0.1.66 or higher.
57
58* `rustc` < 1.10 will not have the `let_readln!` macro.
59
60* `rustc` < 1.7 will have only concrete implementations of `ScanFromStr` for the `Everything`, `Ident`, `Line`, `NonSpace`, `Number`, `Word`, and `Wordish` scanners for `&str` and `String` output types.  1.7 and higher will have generic implementations for all output types such that `&str: Into<Output>`.
61
62* `rustc` < 1.6 is explicitly not supported, due to breaking changes in Rust itself.
63
64## Features
65
66The following [optional features](http://doc.crates.io/manifest.html#the-features-section) are available:
67
68* `arrays-32`: implement scanning for arrays of up to 32 elements.  The default is up to 8 elements.
69
70* `duration-iso8601-dates`: support scanning ISO 8601 durations with date components.
71
72* `regex`: include support for the `re`, `re_a`, and `re_str` regular expression-based runtime scanners.  Adds a dependency on the `regex` crate.
73
74* `tuples-16`: implement scanning for tuples of up to 16 elements.  The default is up to 4 elements.
75
76* `unicode-normalization`: include support for `Normalized` and `IgnoreCaseNormalized` cursor types.  Adds a dependency on the `unicode-normalization` crate.
77
78The following are only supported on nightly compilers, and may disappear/change at any time:
79
80* `nightly-pattern`: adds the `until_pat`, `until_pat_a`, and `until_pat_str` runtime scanners using `Pattern`s.
81
82## Important Notes
83
84* There are no default scanners for `&str` or `String`; if you want a string, you should pick an appropriate abstract scanner from the [`scanner`](scanner/index.html) module.
85
86* The macros in this crate are extremely complex.  Moderately complex usage can exhaust the standard macro recursion limit.  If this happens, you can raise the limit (from its default of 64) by adding the following attribute to your crate's root module:
87
88  `#![recursion_limit="128"]`
89
90## Quick Examples
91
92Here is a simple CLI program that asks the user their name and age.  You can run this using `cargo run --example ask_age`.
93
94```ignore
95#[macro_use] extern crate scan_rules;
96
97use scan_rules::scanner::Word;
98
99fn main() {
100    print!("What's your name? ");
101    let name: String = readln! { (let name: Word<String>) => name };
102    //                           ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~^ rule
103    //                                                       ^~~^ body
104    //                           ^~~~~~~~~~~~~~~~~~~~~~~^ pattern
105    //                            ^~~~~~~~~~~~~~~~~~~~~^ variable binding
106
107    print!("Hi, {}.  How old are you? ", name);
108    readln! {
109        (let age) => {
110    //   ^~~~~~^ implicitly typed variable binding
111            let age: i32 = age;
112            println!("{} years old, huh?  Neat.", age);
113        },
114        (..other) => println!("`{}` doesn't *look* like a number...", other),
115    //   ^~~~~~^ bind to any input "left over"
116    }
117
118    print!("Ok.  What... is your favourite colour? (R, G, B): ");
119    let_readln!(let r: f32, ",", let g: f32, ",", let b: f32);
120    //          ^~~~^            ^~~~^            ^~~~^
121    // Scans and binds three variables without nesting scope.
122    // Panics if *anything* goes wrong.
123    if !(g < r && g < b && b >= r * 0.25 && b <= r * 0.75) {
124        println!("Purple's better.");
125    } else {
126        println!("Good choice!");
127    }
128}
129```
130
131This example shows how to parse one of several different syntaxes.  You can run this using `cargo run --example scan_data`.
132
133```ignore
134#[macro_use] extern crate scan_rules;
135
136use std::collections::BTreeSet;
137
138// `Word` is an "abstract" scanner; rather than scanning itself, it scans some
139// *other* type using custom rules.  In this case, it scans a word into a
140// string slice.  You can use `Word<String>` to get an owned string.
141use scan_rules::scanner::Word;
142
143#[derive(Debug)]
144enum Data {
145    Vector(i32, i32, i32),
146    Truthy(bool),
147    Words(Vec<String>),
148    Lucky(BTreeSet<i32>),
149    Other(String),
150}
151
152fn main() {
153    print!("Enter some data: ");
154    let data = readln! {
155        ("<", let x, ",", let y, ",", let z, ">") => Data::Vector(x, y, z),
156    //      ^ pattern terms are comma-separated
157    //   ^~^ literal text match
158
159        // Rules are tried top-to-bottom, stopping as soon as one matches.
160        (let b) => Data::Truthy(b),
161        ("yes") => Data::Truthy(true),
162        ("no") => Data::Truthy(false),
163
164        ("words:", [ let words: Word<String> ],+) => Data::Words(words),
165    //             ^~~~~~~~~~~~~~~~~~~~~~~~~~~~^ repetition pattern
166    //                                         ^ one or more matches
167    //                                        ^ matches must be comma-separated
168
169        ("lucky numbers:", [ let ns: i32 ]*: BTreeSet<_>) => Data::Lucky(ns),
170    //          collect into specific type ^~~~~~~~~~~~^
171    //                                    ^ zero or more (you might be unlucky!)
172    //                                      (no separator this time)
173
174        // Rather than scanning a sequence of values and collecting them into
175        // a `BTreeSet`, we can instead scan the `BTreeSet` *directly*.  This
176        // scans the syntax `BTreeSet` uses when printed using `{:?}`:
177        // `{1, 5, 13, ...}`.
178        ("lucky numbers:", let ns) => Data::Lucky(ns),
179
180        (..other) => Data::Other(String::from(other))
181    };
182    println!("data: {:?}", data);
183}
184```
185
186This example demonstrates using runtime scanners and the `let_scan!` convenience macro.  You can run this using `cargo run --example runtime_scanners`.
187
188```ignore
189//! **NOTE**: requires the `regex` feature.
190#[macro_use] extern crate scan_rules;
191
192fn main() {
193    use scan_rules::scanner::{
194        NonSpace, Number, Word,             // static scanners
195        max_width_a, exact_width_a, re_str, // runtime scanners
196    };
197
198    // Adapted example from <http://en.cppreference.com/w/cpp/io/c/fscanf>.
199    let inp = "25 54.32E-1 Thompson 56789 0123 56ß水";
200
201    // `let_scan!` avoids the need for indentation and braces, but only supports
202    // a single pattern, and panics if anything goes wrong.
203    let_scan!(inp; (
204        let i: i32, let x: f32, let str1 <| max_width_a::<NonSpace>(9),
205    //               use runtime scanner ^~~~~~~~~~~~~~~~~~~~~~~~~~~~^
206    //          limit maximum width of a... ^~~~~~~~~~^
207    //                      ...static NonSpace scanner... ^~~~~~~^
208    //                                                      9 bytes ^
209        let j <| exact_width_a::<i32>(2), let y: f32, let _: Number,
210    //        ^~~~~~~~~~~~~~~~~~~~~~~~~^ scan an i32 with exactly 2 digits
211        let str2 <| re_str(r"^[0-9]{1,3}"), let warr: Word
212    //           ^~~~~~~~~~~~~~~~~~~~~~~~^ scan using a regular expression
213    ));
214
215    println!(
216        "Converted fields:\n\
217            i = {i:?}\n\
218            x = {x:?}\n\
219            str1 = {str1:?}\n\
220            j = {j:?}\n\
221            y = {y:?}\n\
222            str2 = {str2:?}\n\
223            warr = {warr:?}",
224        i=i, j=j, x=x, y=y,
225        str1=str1, str2=str2, warr=warr);
226}
227```
228
229## Rule Syntax
230
231Scanning rules are written as one or more arms like so:
232
233```ignore
234scan! { input_expression;
235    ( pattern ) => body,
236    ( pattern ) => body,
237    ...
238    ( pattern ) => body,
239}
240```
241
242Note that the trailing comma on the last rule is optional.
243
244Rules are checked top-to-bottom, stopping at the first that matches.
245
246Patterns (explained under ["Pattern Syntax"](#pattern-syntax)) must be enclosed in parentheses.  If a pattern matches the provided input, the corresponding body is evaluated.
247
248### Pattern Syntax
249
250A scanning pattern is made up of one or more pattern terms, separated by commas.  The following terms are supported:
251
252* *strings* - any expression that evaluates to a string will be used as a literal match on the input.  Exactly *how* this match is done depends on the kind of input, but the default is to do a case-sensitive match of whole words, individual non-letter characters, and to ignore all whitespace.
253
254  *E.g.* `"Two words"`, `"..."` (counts as three "words"), `&format!("{} {}", "Two", "words")`.
255
256* `let` *name* \[ `:` *type* ] - scans a value out of the input text, and binds it to *name*.  If *type* is omitted, it will be inferred.
257
258  *E.g.* `let x`, `let n: i32`, `let words: Vec<_>`, `let _: &str` (scans and discards a value).
259
260* `let` *name* `<|` *expression* - scans a value out of the input text and binds it to *name*, using the value of *expression* to perform the scan.  The expression must evaluate to something that implements the `ScanStr` trait.
261
262  *E.g.* `let n <| scan_a::<i32>()` (same as above example for `n`), `let three_digits <| max_width_a::<u32>()` (scan a three-digit `u32`).
263
264* `..` *name* - binds the remaining, unscanned input as a string to *name*.  This can *only* appear as the final term in a top-level pattern.
265
266* `[` *pattern* `]` \[ *(nothing)* | `,` | `(` *seperator pattern* `)` ] ( `?` | `*` | `+` | `{` *range* `}` ) \[ ":" *collection type* ] - scans *pattern* repeatedly.
267
268  The first (mandatory) part of the term specifies the *pattern* that should be repeatedly scanned.
269
270  The second (optional) part of the term controls if (and what) repeats are separated by.  `,` is provided as a short-cut to an obvious common case; it is equivalent to writing `(",")`.  Otherwise, you may write any arbitrary *separator pattern* as the separator, including variable bindings and more repetitions.
271
272  The third (mandatory) part of the term specifies how many times *pattern* should be scanned.  The available options are:
273
274  * `?` - match zero or one times.
275  * `*` - match zero or more times.
276  * `+` - match one or more times.
277  * `{n}` - match exactly *n* times.
278  * `{a,}` - match at least *a* times.
279  * `{,b}` - match at most *b* times.
280  * `{a, b}` - match at least *a* times, and at most *b* times.
281
282  The fourth (optional) part of the term specifies what type of collection scanned values should be added to.  Note that the type specified here applies to *all* values captured by this repetition.  As such, you typically want to use a partially inferred type such as `BTreeSet<_>`.  If omitted, it defaults to `Vec<_>`.
283
284  *E.g.* `[ let nums: i32 ],+`, `[ "pretty" ]*, "please"`.
285
286*/
287#![cfg_attr(feature="nightly-pattern", feature(pattern))]
288#![forbid(missing_docs)]
289#![recursion_limit="128"]
290#[macro_use] extern crate lazy_static;
291extern crate itertools;
292extern crate strcursor;
293#[cfg(feature="regex")] extern crate regex;
294#[cfg(feature="unicode-normalization")] extern crate unicode_normalization;
295
296#[macro_use] mod macros;
297
298pub use error::{ScanError, ScanErrorAt, ScanErrorKind};
299
300mod error;
301pub mod input;
302pub mod internal;
303pub mod scanner;
304mod unicode;
305mod util;