# JSN
_A queryable, streaming, JSON pull-parser with low allocation overhead._
- **Pull parser?**: The parser is implemented as an iterator that emits tokens
- **Streaming?**: The JSON document being parsed is never fully loaded into
memory. It is read & validated byte by byte. This makes it ideal for dealing
with large JSON documents
- **Queryable?** You can configure the iterator to only emit & allocate tokens
for the parts of the input you are interested in.
JSON is expected to conform to
[RFC 8259](https://datatracker.ietf.org/doc/html/rfc8259). It can come from any
source that implements the [`Read`](std::io::Read) trait (e.g. a file, byte
slice, network socket etc..)
## Basic Usage
```rust
use jsn::{Tokens, mask::*};
let data = r#"
{
"name": "John Doe",
"age": 43,
"phones": [
"+44 1234567",
"+44 2345678",
]
}
"#;
assert_eq!(iter.next().unwrap(), 43);
assert_eq!(iter.next().unwrap(), "+44 1234567");
assert_eq!(iter.next(), None);
```
## Quick Explanation
Like traditional streaming parsers, the parser emits JSON tokens. The twist is
that you can query them in a "fun" way. The best analogy is
[bitmasks](https://stackoverflow.com/questions/10493411/what-is-bit-masking).
If you can use a logical `AND` to extract a bit pattern:
```text
input : 0101 0101
AND
bitmask : 0000 1111
=
pattern : 0000 0101
```
Why can't you use a logical `AND` to extract a JSON token pattern?
```text
input : { "hello": { "name" : "world" } }
AND
json mask : {something that extracts a "hello" key}
=
pattern : _ ________ { "name" : "world" } _
```
That `{something that extracts a "hello" key}` is what this crate provides.
## Memory Footprint
`jsn` allows you to select the parts of your JSON that are of interest. What you
do with those parts and how long you keep them in memory is up to you.
To illustrate this, I'll use the Valgrind DHAT tool to profile the heap memory
usage of two similar programs. Both programs read & extract keys from a JSON
file. I'll be using the sf-city-lots json file (189 MB) from
[here](https://raw.githubusercontent.com/zemirco/sf-city-lots-json/33c27c137784a96d0fbd7f329dceda6cc7f49fa3/citylots.json).
- `examples/store-tokens.rs`: This program keeps the extracted tokens in a Vec
- `examples/print-tokens.rs`: This program prints the tokens as they are
encountered
```shell
valgrind --tool=dhat ./target/profiling/examples/store-tokens ~/downloads/citylots.json
# ==1146722== Total: 13,823,524 bytes in 196,541 blocks
# ==1146722== At t-gmax: 7,529,044 bytes in 196,515 blocks
```
```shell
valgrind --tool=dhat ./target/profiling/examples/print-tokens ~/downloads/citylots.json
# ==1152944== Total: 1,240,708 bytes in 196,524 blocks
# ==1152944== At t-gmax: 9,367 bytes in 9 blocks
```
The first number (Total) is the total amount of heap memory that was allocated
by the program during its execution.
The second number (At t-gmax) is the maximum amount of allocated memory at any
one time during execution
Unsurprisingly, `store-tokens.rs` has a higher footprint. Yet, the crate's
utility is still obvious because the total memory allocated (13 MB) is still an
order of magnitude less than the size of the file (189 MB).
Things get better when you can operate immediately on tokens as they are
yielded (i.e. you do not accumulate them). Not only do you allocate less in
total, but your footprint is much much smaller. `print-tokens.rs` ripped
through the file while using at most 7KB of heap memory at any one time.