scan_rules/lib.rs
1/*
2Copyright ⓒ 2016 Daniel Keep.
3
4Licensed under the MIT license (see LICENSE or <http://opensource.org
5/licenses/MIT>) or the Apache License, Version 2.0 (see LICENSE of
6<http://www.apache.org/licenses/LICENSE-2.0>), at your option. All
7files in the project carrying such notice may not be copied, modified,
8or distributed except according to those terms.
9*/
10/*!
11
12This crate provides some macros for quickly parsing values out of text. Roughly speaking, it does the inverse of the `print!`/`format!` macros; or, in other words, a similar job to `scanf` from C.
13
14The macros of interest are:
15
16* [`readln!`](macro.readln!.html) - reads and scans a line from standard input.
17* [`try_readln!`](macro.try_readln!.html) - like `readln!`, except it returns a `Result` instead of panicking.
18* [`scan!`](macro.scan!.html) - scans the provided string.
19
20Plus two convenience macros:
21
22* [`let_scan!`](macro.let_scan!.html) - scans a string and binds captured values directly to local variables. Only supports *one* pattern and panics if it doesn't match.
23* [`let_readln!`](macro.let_readln!.html) - reads and scans a line from standard input, binding captured values directly to local variables. Only supports *one* pattern and panics if it doesn't match.
24
25If you are interested in implementing support for your own types, see the [`ScanFromStr`](scanner/trait.ScanFromStr.html) and [`ScanStr`](scanner/trait.ScanStr.html) traits.
26
27The provided scanners can be found in the [`scanner`](scanner/index.html) module.
28
29<style type="text/css">
30.link-block { font-family: "Fira Sans"; }
31.link-block > p { display: inline-block; }
32.link-block > p > strong { font-weight: 500; margin-right: 1em; }
33.link-block > ul { display: inline-block; padding: 0; list-style: none; }
34.link-block > ul > li {
35 font-size: 0.8em;
36 background-color: #eee;
37 border: 1px solid #ccc;
38 padding: 0.3em;
39 display: inline-block;
40}
41</style>
42<span></span><div class="link-block">
43
44**Links**
45
46* [Latest Release](https://crates.io/crates/scan-rules/)
47* [Latest Docs](https://danielkeep.github.io/rust-scan-rules/doc/scan_rules/index.html)
48* [Repository](https://github.com/DanielKeep/rust-scan-rules)
49
50<span></span></div>
51
52## Compatibility
53
54`scan-rules` is compatible with `rustc` version 1.6.0 and higher.
55
56* Due to a breaking change, `scan-rules` is not compatible with `regex` version 0.1.66 or higher.
57
58* `rustc` < 1.10 will not have the `let_readln!` macro.
59
60* `rustc` < 1.7 will have only concrete implementations of `ScanFromStr` for the `Everything`, `Ident`, `Line`, `NonSpace`, `Number`, `Word`, and `Wordish` scanners for `&str` and `String` output types. 1.7 and higher will have generic implementations for all output types such that `&str: Into<Output>`.
61
62* `rustc` < 1.6 is explicitly not supported, due to breaking changes in Rust itself.
63
64## Features
65
66The following [optional features](http://doc.crates.io/manifest.html#the-features-section) are available:
67
68* `arrays-32`: implement scanning for arrays of up to 32 elements. The default is up to 8 elements.
69
70* `duration-iso8601-dates`: support scanning ISO 8601 durations with date components.
71
72* `regex`: include support for the `re`, `re_a`, and `re_str` regular expression-based runtime scanners. Adds a dependency on the `regex` crate.
73
74* `tuples-16`: implement scanning for tuples of up to 16 elements. The default is up to 4 elements.
75
76* `unicode-normalization`: include support for `Normalized` and `IgnoreCaseNormalized` cursor types. Adds a dependency on the `unicode-normalization` crate.
77
78The following are only supported on nightly compilers, and may disappear/change at any time:
79
80* `nightly-pattern`: adds the `until_pat`, `until_pat_a`, and `until_pat_str` runtime scanners using `Pattern`s.
81
82## Important Notes
83
84* There are no default scanners for `&str` or `String`; if you want a string, you should pick an appropriate abstract scanner from the [`scanner`](scanner/index.html) module.
85
86* The macros in this crate are extremely complex. Moderately complex usage can exhaust the standard macro recursion limit. If this happens, you can raise the limit (from its default of 64) by adding the following attribute to your crate's root module:
87
88 `#![recursion_limit="128"]`
89
90## Quick Examples
91
92Here is a simple CLI program that asks the user their name and age. You can run this using `cargo run --example ask_age`.
93
94```ignore
95#[macro_use] extern crate scan_rules;
96
97use scan_rules::scanner::Word;
98
99fn main() {
100 print!("What's your name? ");
101 let name: String = readln! { (let name: Word<String>) => name };
102 // ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~^ rule
103 // ^~~^ body
104 // ^~~~~~~~~~~~~~~~~~~~~~~^ pattern
105 // ^~~~~~~~~~~~~~~~~~~~~^ variable binding
106
107 print!("Hi, {}. How old are you? ", name);
108 readln! {
109 (let age) => {
110 // ^~~~~~^ implicitly typed variable binding
111 let age: i32 = age;
112 println!("{} years old, huh? Neat.", age);
113 },
114 (..other) => println!("`{}` doesn't *look* like a number...", other),
115 // ^~~~~~^ bind to any input "left over"
116 }
117
118 print!("Ok. What... is your favourite colour? (R, G, B): ");
119 let_readln!(let r: f32, ",", let g: f32, ",", let b: f32);
120 // ^~~~^ ^~~~^ ^~~~^
121 // Scans and binds three variables without nesting scope.
122 // Panics if *anything* goes wrong.
123 if !(g < r && g < b && b >= r * 0.25 && b <= r * 0.75) {
124 println!("Purple's better.");
125 } else {
126 println!("Good choice!");
127 }
128}
129```
130
131This example shows how to parse one of several different syntaxes. You can run this using `cargo run --example scan_data`.
132
133```ignore
134#[macro_use] extern crate scan_rules;
135
136use std::collections::BTreeSet;
137
138// `Word` is an "abstract" scanner; rather than scanning itself, it scans some
139// *other* type using custom rules. In this case, it scans a word into a
140// string slice. You can use `Word<String>` to get an owned string.
141use scan_rules::scanner::Word;
142
143#[derive(Debug)]
144enum Data {
145 Vector(i32, i32, i32),
146 Truthy(bool),
147 Words(Vec<String>),
148 Lucky(BTreeSet<i32>),
149 Other(String),
150}
151
152fn main() {
153 print!("Enter some data: ");
154 let data = readln! {
155 ("<", let x, ",", let y, ",", let z, ">") => Data::Vector(x, y, z),
156 // ^ pattern terms are comma-separated
157 // ^~^ literal text match
158
159 // Rules are tried top-to-bottom, stopping as soon as one matches.
160 (let b) => Data::Truthy(b),
161 ("yes") => Data::Truthy(true),
162 ("no") => Data::Truthy(false),
163
164 ("words:", [ let words: Word<String> ],+) => Data::Words(words),
165 // ^~~~~~~~~~~~~~~~~~~~~~~~~~~~^ repetition pattern
166 // ^ one or more matches
167 // ^ matches must be comma-separated
168
169 ("lucky numbers:", [ let ns: i32 ]*: BTreeSet<_>) => Data::Lucky(ns),
170 // collect into specific type ^~~~~~~~~~~~^
171 // ^ zero or more (you might be unlucky!)
172 // (no separator this time)
173
174 // Rather than scanning a sequence of values and collecting them into
175 // a `BTreeSet`, we can instead scan the `BTreeSet` *directly*. This
176 // scans the syntax `BTreeSet` uses when printed using `{:?}`:
177 // `{1, 5, 13, ...}`.
178 ("lucky numbers:", let ns) => Data::Lucky(ns),
179
180 (..other) => Data::Other(String::from(other))
181 };
182 println!("data: {:?}", data);
183}
184```
185
186This example demonstrates using runtime scanners and the `let_scan!` convenience macro. You can run this using `cargo run --example runtime_scanners`.
187
188```ignore
189//! **NOTE**: requires the `regex` feature.
190#[macro_use] extern crate scan_rules;
191
192fn main() {
193 use scan_rules::scanner::{
194 NonSpace, Number, Word, // static scanners
195 max_width_a, exact_width_a, re_str, // runtime scanners
196 };
197
198 // Adapted example from <http://en.cppreference.com/w/cpp/io/c/fscanf>.
199 let inp = "25 54.32E-1 Thompson 56789 0123 56ß水";
200
201 // `let_scan!` avoids the need for indentation and braces, but only supports
202 // a single pattern, and panics if anything goes wrong.
203 let_scan!(inp; (
204 let i: i32, let x: f32, let str1 <| max_width_a::<NonSpace>(9),
205 // use runtime scanner ^~~~~~~~~~~~~~~~~~~~~~~~~~~~^
206 // limit maximum width of a... ^~~~~~~~~~^
207 // ...static NonSpace scanner... ^~~~~~~^
208 // 9 bytes ^
209 let j <| exact_width_a::<i32>(2), let y: f32, let _: Number,
210 // ^~~~~~~~~~~~~~~~~~~~~~~~~^ scan an i32 with exactly 2 digits
211 let str2 <| re_str(r"^[0-9]{1,3}"), let warr: Word
212 // ^~~~~~~~~~~~~~~~~~~~~~~~^ scan using a regular expression
213 ));
214
215 println!(
216 "Converted fields:\n\
217 i = {i:?}\n\
218 x = {x:?}\n\
219 str1 = {str1:?}\n\
220 j = {j:?}\n\
221 y = {y:?}\n\
222 str2 = {str2:?}\n\
223 warr = {warr:?}",
224 i=i, j=j, x=x, y=y,
225 str1=str1, str2=str2, warr=warr);
226}
227```
228
229## Rule Syntax
230
231Scanning rules are written as one or more arms like so:
232
233```ignore
234scan! { input_expression;
235 ( pattern ) => body,
236 ( pattern ) => body,
237 ...
238 ( pattern ) => body,
239}
240```
241
242Note that the trailing comma on the last rule is optional.
243
244Rules are checked top-to-bottom, stopping at the first that matches.
245
246Patterns (explained under ["Pattern Syntax"](#pattern-syntax)) must be enclosed in parentheses. If a pattern matches the provided input, the corresponding body is evaluated.
247
248### Pattern Syntax
249
250A scanning pattern is made up of one or more pattern terms, separated by commas. The following terms are supported:
251
252* *strings* - any expression that evaluates to a string will be used as a literal match on the input. Exactly *how* this match is done depends on the kind of input, but the default is to do a case-sensitive match of whole words, individual non-letter characters, and to ignore all whitespace.
253
254 *E.g.* `"Two words"`, `"..."` (counts as three "words"), `&format!("{} {}", "Two", "words")`.
255
256* `let` *name* \[ `:` *type* ] - scans a value out of the input text, and binds it to *name*. If *type* is omitted, it will be inferred.
257
258 *E.g.* `let x`, `let n: i32`, `let words: Vec<_>`, `let _: &str` (scans and discards a value).
259
260* `let` *name* `<|` *expression* - scans a value out of the input text and binds it to *name*, using the value of *expression* to perform the scan. The expression must evaluate to something that implements the `ScanStr` trait.
261
262 *E.g.* `let n <| scan_a::<i32>()` (same as above example for `n`), `let three_digits <| max_width_a::<u32>()` (scan a three-digit `u32`).
263
264* `..` *name* - binds the remaining, unscanned input as a string to *name*. This can *only* appear as the final term in a top-level pattern.
265
266* `[` *pattern* `]` \[ *(nothing)* | `,` | `(` *seperator pattern* `)` ] ( `?` | `*` | `+` | `{` *range* `}` ) \[ ":" *collection type* ] - scans *pattern* repeatedly.
267
268 The first (mandatory) part of the term specifies the *pattern* that should be repeatedly scanned.
269
270 The second (optional) part of the term controls if (and what) repeats are separated by. `,` is provided as a short-cut to an obvious common case; it is equivalent to writing `(",")`. Otherwise, you may write any arbitrary *separator pattern* as the separator, including variable bindings and more repetitions.
271
272 The third (mandatory) part of the term specifies how many times *pattern* should be scanned. The available options are:
273
274 * `?` - match zero or one times.
275 * `*` - match zero or more times.
276 * `+` - match one or more times.
277 * `{n}` - match exactly *n* times.
278 * `{a,}` - match at least *a* times.
279 * `{,b}` - match at most *b* times.
280 * `{a, b}` - match at least *a* times, and at most *b* times.
281
282 The fourth (optional) part of the term specifies what type of collection scanned values should be added to. Note that the type specified here applies to *all* values captured by this repetition. As such, you typically want to use a partially inferred type such as `BTreeSet<_>`. If omitted, it defaults to `Vec<_>`.
283
284 *E.g.* `[ let nums: i32 ],+`, `[ "pretty" ]*, "please"`.
285
286*/
287#![cfg_attr(feature="nightly-pattern", feature(pattern))]
288#![forbid(missing_docs)]
289#![recursion_limit="128"]
290#[macro_use] extern crate lazy_static;
291extern crate itertools;
292extern crate strcursor;
293#[cfg(feature="regex")] extern crate regex;
294#[cfg(feature="unicode-normalization")] extern crate unicode_normalization;
295
296#[macro_use] mod macros;
297
298pub use error::{ScanError, ScanErrorAt, ScanErrorKind};
299
300mod error;
301pub mod input;
302pub mod internal;
303pub mod scanner;
304mod unicode;
305mod util;