1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
//! High-fidelity JSON lexer and parser.
//!
//!
//! # Introduction
//!
//! JSON is a data format that is underspecified and sometimes contradictory.
//! As reference, I recommend the excellent article "[Parsing JSON is a Minefield]".
//! In particular, it is ambiguous how to parse strings and numbers.
//! For example, JSON does not impose any restriction on the maximal size of numbers,
//! but in reality, most JSON parsers use a lossy representation,
//! for example 64-bit floating point.
//! This is allowed by the JSON specification; however,
//! at the same time, if we are allowed to fix arbitrary maximal sizes,
//! then a parser that fails on every input is a valid parser!
//! I hope that I could convince you at this point that this is all quite a mess.
//! The best I can do to help you around this mess is to give *you*
//! a tool to deal with this mess in the way that suits you most.
//! hifijson is this tool.
//!
//! [Parsing JSON is a Minefield]: http://seriot.ch/projects/parsing_json.html
//!
//! What makes hifijson so flexible is that unlike most other JSON parsers,
//! it exposes its basic building blocks, called [*lexers*](#lexers),
//! that allow you to build your own parsers on top of them.
//!
//! Because hifijson exposes a variety of lexers and parsers,
//! you can combine them in a way that allows you to achieve your desired behaviour,
//! without having to write everything from scratch.
//! For example, suppose that your input data does not contain escape sequences (`\n`, `\uxxxx`);
//! then you can use the [`str::LexWrite::str_bytes`] function that is
//! guaranteed to never allocate memory when lexing from a slice,
//! making it suitable for usage in embedded environments.
//! Or suppose that you are reading an object `{"title": ..., "reviews": ...}`,
//! and you do not feel like caring about reviews today.
//! Then you can simply skip reading the value for reviews by using [`ignore::parse`].
//! Going wild and stretching the syntax a bit, you can also make
//! a parser that accepts any value (instead of only strings as mandated by JSON) as object key.
//! Or, if you just want to have a complete JSON value, then
//! you can use [`value::parse_unbounded`].
//! The choice is yours.
//!
//! In summary, hifijson aims to give you the tools to interpret JSON-like data
//! flexibly and performantly.
//!
//! ## Lexers
//!
//! The hardest part of lexing JSON are strings and numbers.
//! hifijson offers many different string/number lexers,
//! which differ most prominently in their memory allocation behaviour.
//! For example,
//! * [`str::Lex::str_ignore`] discards a string,
//! * [`str::LexWrite::str_bytes`] reads a string, but does not interpret escape sequences, and
//! * [`str::LexAlloc::str_string`] reads a string and interprets escape sequences.
//!
//! In particular,
//! lexers that implement the [`Lex`] trait do *never* allocate memory;
//! lexers that implement the [`LexWrite`] trait only allocate memory when lexing from iterators,
//! and
//! lexers that implement the [`LexAlloc`] trait may allocate memory when lexing from both
//! iterators and slices.
//!
//! ## Slices and Iterators
//!
//! One important feature of hifijson is that it allows to read from both
//! [slices](SliceLexer) and [iterators](IterLexer) over bytes.
//! This is useful when your application should support reading from both
//! files and streams (such as standard input).
//!
//! ## Feature Flags
//!
//! If you build hifijson without the feature flag `alloc`, you disable any allocation.
//! If you build hifijson with the feature flag `serde`,
//! then you can use hifijson to deserialise JSON to data types implementing `serde::Deserialize`.
//!
//!
//! # Examples
//!
//! ## Parsing strings to values
//!
//! Let us consider a very simple usage:
//! Parsing a JSON value from a string.
//! For this, we first have to create a lexer from the string,
//! then call the value parser on the lexer:
//!
//! ~~~
//! // our input JSON that we want to parse
//! let json = br#"[null, true, false, "hello", 0, 3.1415, [1, 2], {"x": 1, "y": 2}]"#;
//!
//! // the lexer on our input -- just creating it does not actually run it yet
//! let mut lexer = hifijson::SliceLexer::new(json);
//!
//! use hifijson::token::Lex;
//! // now we are going -- we try to
//! // obtain exactly one JSON value from the lexer and
//! // parse it to a value, allowing for arbitrarily deep (unbounded) nesting
//! let value = lexer.exactly_one(Lex::ws_peek, hifijson::value::parse_unbounded);
//! let value = value.expect("parse");
//!
//! // yay, we got an array!
//! assert!(matches!(value, hifijson::value::Value::Array(_)));
//! assert_eq!(
//! value.to_string(),
//! // printing a value yields a compact representation with minimal spaces
//! r#"[null,true,false,"hello",0,3.1415,[1,2],{"x":1,"y":2}]"#
//! );
//! ~~~
//!
//! ## Parsing files and streams
//!
//! The following example reads JSON from a file if an argument is given,
//! otherwise from standard input:
//!
//! ~~~ no_run
//! /// Parse a single JSON value and print it.
//! ///
//! /// Note that the `LexAlloc` trait indicates that this lexer allocates memory.
//! fn process<L: hifijson::LexAlloc>(mut lexer: L) {
//! let value = lexer.exactly_one(L::ws_peek, hifijson::value::parse_unbounded);
//! let value = value.expect("parse");
//! println!("{}", value);
//! }
//!
//! let filename = std::env::args().nth(1);
//! if let Some(filename) = filename {
//! let file = std::fs::read(filename).expect("read file");
//! process(hifijson::SliceLexer::new(&file))
//! } else {
//! use std::io::Read;
//! process(hifijson::IterLexer::new(std::io::stdin().bytes()))
//! }
//! ~~~
//!
//! We just made a pretty printer (stretching the definition of pretty pretty far).
//!
//! ## Operating on the lexer
//!
//! Often, it is better for performance to operate directly on the non-whitespace characters that the lexer yields
//! rather than parsing everything into a value and then processing the value.
//! For example, the following example counts the number of values in the input JSON.
//! Unlike the previous examples, it requires only constant memory!
//!
//! ~~~
//! use hifijson::{Error, Expect, Lex};
//!
//! /// Recursively count the number of values in the value starting with the `next` character.
//! ///
//! /// The `Lex` trait indicates that this lexer does *not* allocate memory.
//! fn count<L: Lex>(next: u8, lexer: &mut L) -> Result<usize, Error> {
//! match next {
//! // the JSON values "null", "true", and "false"
//! b'a'..=b'z' => Ok(lexer.null_or_bool().map(|_| 1).ok_or(Expect::Value)?),
//! b'0'..=b'9' | b'-' => Ok(lexer.num_ignore().validate().map(|_| 1)?),
//! b'"' => Ok(lexer.discarded().str_ignore().map(|_| 1)?),
//!
//! // start of array
//! b'[' => {
//! // an array is a value itself, so start with 1
//! let mut sum = 1;
//! // perform the following for every item of the array
//! lexer.discarded().seq(b']', L::ws_peek, |next, lexer| {
//! sum += count(next, lexer)?;
//! Ok::<_, Error>(())
//! })?;
//! Ok(sum)
//! }
//!
//! // start of object
//! b'{' => {
//! let mut sum = 1;
//! // perform the following for every key-value pair of the object
//! lexer.discarded().seq(b'}', L::ws_peek, |next, lexer| {
//! /// read the key, ignoring it, and then the ':' after it
//! lexer.expect(|_| Some(next), b'"').ok_or(Expect::String)?;
//! lexer.str_ignore().map_err(Error::Str)?;
//! lexer.expect(L::ws_peek, b':').ok_or(Expect::Colon)?;
//!
//! /// peek the next non-whitespace character
//! let next = lexer.ws_peek().ok_or(Expect::Value)?;
//! sum += count(next, lexer)?;
//! Ok::<_, Error>(())
//! })?;
//! Ok(sum)
//! }
//! _ => Err(Expect::Value)?,
//! }
//! }
//!
//! fn process<L: Lex>(mut lexer: L) -> Result<usize, Error> {
//! lexer.exactly_one(L::ws_peek, count)
//! }
//!
//! let json = br#"[null, true, false, "hello", 0, 3.1415, [1, 2], {"x": 1, "y": 2}]"#;
//! let mut lexer = hifijson::SliceLexer::new(json);
//! let n = process(lexer).unwrap();
//! assert_eq!(n, 13)
//! ~~~
//!
//! ## More Examples
//!
//! See the `cat` example for a more worked version of a JSON "pretty" printer
//! that can be also used to lazily filter parts of the data based on a path.
//! hifijson also powers all JSON reading in the [jaq](https://crates.io/crates/jaq) crate,
//! for which it was originally created.
extern crate alloc;
extern crate std;
use ;
pub use Read;
pub use Write;
pub use Expect;
/// Lexing without any need for memory allocation.
/// Lexing that does not allocate memory from slices, but from iterators.
/// Lexing that allocates memory both from slices and iterators.
/// JSON lexer from a shared byte slice.
/// JSON lexer from an iterator over (fallible) bytes.
///
/// This can be used to lex from a [`Read`](std::io::Read) as follows:
///
/// ~~~
/// use std::io::Read;
/// let read = std::io::stdin();
/// let lexer = hifijson::IterLexer::new(read.bytes());
/// ~~~
/// Parse error.
impl_from!;
impl_from!;
impl_from!;
impl_error!;
impl_error!;
impl_error!;
impl_error!;