serde_cursor 0.4.0

fetch the desired parts of a serde-compatible data format efficiently using a jq-like language
Documentation
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
# `serde_cursor`

<!-- cargo-reedme: start -->

<!-- cargo-reedme: info-start

    Do not edit this region by hand
    ===============================

    This region was generated from Rust documentation comments by `cargo-reedme` using this command:

        cargo +nightly reedme

    for more info: https://github.com/nik-rev/cargo-reedme

cargo-reedme: info-end -->

[![crates.io](https://img.shields.io/crates/v/serde_cursor?style=flat-square&logo=rust)](https://crates.io/crates/serde_cursor)
[![docs.rs](https://img.shields.io/docsrs/serde_cursor?style=flat-square&logo=docs.rs)](https://docs.rs/serde_cursor)
![license](https://img.shields.io/badge/license-Apache--2.0_OR_MIT-blue?style=flat-square)
![msrv](https://img.shields.io/badge/msrv-1.85-blue?style=flat-square&logo=rust)
[![github](https://img.shields.io/github/stars/nik-rev/serde-cursor)](https://github.com/nik-rev/serde-cursor)

This crate allows you to declaratively specify how to fetch the desired parts of a serde-compatible data format (such as JSON)
efficiently, without loading it all into memory, using a jq-like language.

```toml
serde_cursor = "0.4"
```

## Examples

The `Cursor!` macro makes it extremely easy to extract nested fields from data.

### Get version from `Cargo.toml`

```rust
use serde_cursor::Cursor;

let data = r#"
    [workspace.package]
    version = "0.1"
"#;

let version: String = toml::from_str::<Cursor!(workspace.package.version)>(data)?.0;
assert_eq!(version, "0.1");
```

`Cursor!(workspace.package.version)` is the magic juice - this type-macro expands to a type that implements [`serde::Deserialize`](https://docs.rs/serde_core/1.0.228/serde_core/de/trait.Deserialize.html).

**Without `serde_cursor`**:

*Pain and suffering…*

```rust
use serde::Deserialize;

#[derive(Deserialize)]
struct CargoToml {
    workspace: Workspace
}

#[derive(Deserialize)]
struct Workspace {
    package: Package
}

#[derive(Deserialize)]
struct Package {
    version: String
}

let data = r#"
    [workspace.package]
    version = "0.1"
"#;

let version = toml::from_str::<CargoToml>(data)?.workspace.package.version;
```

### Get names of all dependencies from `Cargo.lock`

The index-all `[]` accesses every element in an array:

```rust
use serde_cursor::Cursor;

let file = r#"
    [[package]]
    name = "serde"

    [[package]]
    name = "rand"
"#;

let packages: Vec<String> = toml::from_str::<Cursor!(package[].name)>(file)?.0;

assert_eq!(packages, vec!["serde", "rand"]);
```

## Syntax

Specify the type `Vec<String>` after the path `package[].name`:

```rust
let packages = toml::from_str::<Cursor!(package[].name: Vec<String>)>(file)?.0;
```

The type can be omitted, in which case it will be inferred:

```rust
let packages: Vec<String> = toml::from_str::<Cursor!(package[].name)>(file)?.0;
```

Fields that consist of identifiers and `-`s can be used without quotes:

```rust
Cursor!(dev-dependencies.serde.version)
```

Fields that contain spaces or other special characters must be quoted:

```rust
Cursor!(ferris."🦀::<>".r#"""#)
```

You can access specific elements of an array:

```rust
Cursor!(package[0].name)
```

## `serde_cursor` + `monostate` = 🧡💛💚💙💜

The [`monostate`](https://github.com/dtolnay/monostate) crate provides the `MustBe!` macro, which returns a type that implements
[`serde::Deserialize`](https://docs.rs/serde_core/1.0.228/serde_core/de/trait.Deserialize.html), and can only ever deserialize from one specific value.

Together, these 2 crates provide an almost jq-like experience of data processing in Rust:

```rust
// early exit if the `reason` field is not equal to `"compiler-message"`
get!(reason: MustBe!("compiler-message"))?;
get!(message.message: MustBe!("trace_macro"))?;

Ok(Expansion {
    messages: get!(message.children[].message)?,
    byte_start: get!(message.spans[0].byte_start)?,
    byte_end: get!(message.spans[0].byte_end)?,
})
```

The jq version of the above processing looks like this:

```jq
select(.reason == "compiler-message")
| select(.message.message == "trace_macro")
| {
    messages: [.message.children[].message],
    byte_start: .message.spans[0].byte_start,
    byte_end: .message.spans[0].byte_end
}
```

The full code for the above example looks like this:

```rust
use monostate::MustBe;
use serde_cursor::Cursor;

struct Expansion {
    messages: Vec<String>,
    byte_start: u32,
    byte_end: u32,
}

impl Expansion {
    fn parse(value: &[u8]) -> serde_json::Result<Self> {
        macro_rules! get {
            ($($cursor:tt)*) => {
                serde_json::from_slice::<
                    Cursor!($($cursor)*)
                >(value).map(|it| it.0)
            };
        }

        get!(reason: MustBe!("compiler-message"))?;
        get!(message.message: MustBe!("trace_macro"))?;

        Ok(Expansion {
            messages: get!(message.children[].message)?,
            byte_start: get!(message.spans[0].byte_start)?,
            byte_end: get!(message.spans[0].byte_end)?,
        })
    }
}
```

<details>

<summary>

For reference, the same logic without `serde_cursor` or `monostate`

</summary>

```rust
use serde::Deserialize;

struct Expansion {
    messages: Vec<String>,
    byte_start: u32,
    byte_end: u32,
}

impl Expansion {
    fn from_slice(value: &[u8]) -> serde_json::Result<Self> {
        #[derive(Deserialize)]
        struct RawDiagnostic {
            reason: String,
            message: DiagnosticMessage,
        }

        #[derive(Deserialize)]
        struct DiagnosticMessage {
            message: String,
            children: Vec<DiagnosticChild>,
            spans: Vec<DiagnosticSpan>,
        }

        #[derive(Deserialize)]
        struct DiagnosticChild {
            message: String,
        }

        #[derive(Deserialize)]
        struct DiagnosticSpan {
            byte_start: u32,
            byte_end: u32,
        }

        let raw: RawDiagnostic = serde_json::from_slice(value)?;

        if raw.reason != "compiler-message" || raw.message.message != "trace_macro" {
            return Err(serde::de::Error::custom("..."));
        }

        let primary_span = raw.message.spans.get(0)
            .ok_or_else(|| serde::de::Error::custom("..."))?;

        Ok(Expansion {
            messages: raw.message.children.into_iter().map(|c| c.message).collect(),
            byte_start: primary_span.byte_start,
            byte_end: primary_span.byte_end,
        })
    }
}
```

</details>

## Ranges

Ranges are like `[]` but for only for elements with an index that falls in the range:

```rust
Cursor!(package[4..]);
Cursor!(package[..8]);
Cursor!(package[4..8]);
Cursor!(package[4..=8]);
```

## Interpolations

It’s not uncommon for multiple queries to get quite repetitive:

```rust
let pressure: Vec<f64> = toml::from_str::<Cursor!(france.properties.timeseries[].data.instant.details.air_pressure_at_sea_level)>(france)?.0;
let humidity: Vec<f64> = toml::from_str::<Cursor!(japan.properties.timeseries[].data.instant.details.relative_humidity)>(japan)?.0;
let temperature: Vec<f64> = toml::from_str::<Cursor!(japan.properties.timeseries[].data.instant.details.air_temperature)>(japan)?.0;
```

`serde_cursor` supports **interpolations**. You can factor out a common path into a type `Details`, and then interpolate it with `$Details` in the path inside `Cursor!`:

```rust
type Details<RestOfPath> = serde_cursor::Path!(properties.timeseries[].data.instant.details + RestOfPath);

let pressure: Vec<f64> = toml::from_str::<Cursor!(france.$Details.air_pressure_at_sea_level)>(france)?.0;
let humidity: Vec<f64> = toml::from_str::<Cursor!(japan.$Details.relative_humidity)>(japan)?.0;
let temperature: Vec<f64> = toml::from_str::<Cursor!(japan.$Details.air_temperature)>(japan)?.0;
```

## `serde_cursor` vs [`serde_query`]https://github.com/pandaman64/serde-query

`serde_query` also implements jq-like queries, but more verbosely.

### Single query

`serde_cursor`:

```rust
use serde_cursor::Cursor;

let data = r#"{ "commits": [{"author": "Ferris"}] }"#;

let authors: Vec<String> = serde_json::from_str::<Cursor!(commits[].author)>(data)?.0;
```

`serde_query`:

```rust
use serde_query::Deserialize;

#[derive(Deserialize)]
struct Data {
    #[query(".commits.[].author")]
    authors: Vec<String>,
}

let data = r#"{ "commits": [{"author": "Ferris"}] }"#;
let data: Data = serde_json::from_str(data)?;

let authors = data.authors;
```

### Storing queries in a `struct`

`serde_cursor`:

```rust
use serde::Deserialize;
use serde_cursor::Cursor;

#[derive(Deserialize)]
struct Data {
    #[serde(rename = "commits")]
    authors: Cursor!([].author: Vec<String>),
    count: usize,
}

let data = r#"{ "count": 1, "commits": [{"author": "Ferris"}] }"#;

let data: Data = serde_json::from_str(data)?;
```

`serde_query`:

```rust
use serde_query::Deserialize;

#[derive(Deserialize)]
struct Data {
    #[query(".commits.[].author")]
    authors: Vec<String>,
    #[query(".count")]
    count: usize,
}

let data = r#"{ "count": 1, "commits": [{"author": "Ferris"}] }"#;

let data: Data = serde_json::from_str(data)?;
```

## Great error messages

When deserialization fails, you get the exact path of where the failure occurred:

```rust
use serde_cursor::Cursor;

let data = serde_json::json!({ "author": { "id": "not-a-number" } });
let result = serde_json::from_value::<Cursor!(author.id: i32)>(data);
let err = result.unwrap_err().to_string();
assert_eq!(err, r#".author.id: invalid type: string "not-a-number", expected i32"#);
```

## `serde_with` integration

If `feature = "serde_with"` is enabled, the type returned by `Cursor!` will implement [`serde_with::DeserializeAs`](https://docs.rs/serde_with/latest/serde_with/trait.DeserializeAs.html) and [`serde_with::SerializeAs`](https://docs.rs/serde_with/latest/serde_with/trait.SerializeAs.html),
meaning you can use it with the `#[serde_as]` attribute:

```rust
use serde::{Serialize, Deserialize};
use serde_cursor::Cursor;

#[serde_as]
#[derive(Serialize, Deserialize)]
struct CargoToml {
    #[serde(rename = "workspace")]
    #[serde_as(as = "Cursor!(package.version)")]
    version: String,
}

let toml: CargoToml = toml::from_str("workspace = { package = { version = '0.1.0' } }")?;
assert_eq!(toml.version, "0.1.0");
assert_eq!(serde_json::to_string(&toml)?, r#"{"workspace":{"package":{"version":"0.1.0"}}}"#);
```

## How does it work?

The `Cursor!` macro expands to a recursive type that implements [`serde::Deserialize`](https://docs.rs/serde_core/1.0.228/serde_core/de/trait.Deserialize.html).
Information on how to access the nested fields is stored entirely inside the type system.

Consider this query, which gets the first dependency of every dependency in `Cargo.toml`:

```rust
Cursor!(package[].dependencies[0]: String)
```

For this `Cargo.lock`, it would extract `["libc", "find-msvc-tools"]`:

```toml
[[package]]
name = "android_system_properties"
dependencies = ["libc"]

[[package]]
name = "cc"
dependencies = ["find-msvc-tools", "shlex"]
```

That macro is expanded into a `Cursor` type, which implements [`serde::Deserialize`](https://docs.rs/serde_core/1.0.228/serde_core/de/trait.Deserialize.html) and [`serde::Serialize`](https://docs.rs/serde_core/1.0.228/serde_core/ser/trait.Serialize.html):

```rust
Cursor<
    String, // : String
    Path<
        Field<"package">, // .package
        Path<
            IndexAll, // []
            Path<
                Field<"dependencies">, // .dependencies
                Path<
                    Index<0>, // [0]
                    PathEnd
                >,
            >,
        >,
    >,
>
```

The above is essentially an equivalent to:

```rust
vec![
    Segment::Field("package"), // .package
    Segment::IndexAll, // []
    Segment::Field("dependencies"), // .dependencies
    Segment::Index(0) // [0]
]
```

Except it exists entirely in the type system.

Each time the [`serde::Deserialize::deserialize()`](https://docs.rs/serde/latest/serde/trait.Deserialize.html#tymethod.deserialize) function is called,
the first segment of the path (`.package`) is processed, and the rest of the path (`[].dependencies[0]`) is passed to the
[`serde::Deserialize`](https://docs.rs/serde_core/1.0.228/serde_core/de/trait.Deserialize.html) trait, again, and again - until the path is empty.

Once the path is empty, we finally get to the type of the field - the `String` in the above example,
and finally call [`serde::Deserialize::deserialize()`](https://docs.rs/serde/latest/serde/trait.Deserialize.html#tymethod.deserialize) on that, to finish things off -
this `String` is then bubbled up the stack and returned from `<Cursor<String, _> as serde::Deserialize>::deserialize`.

<!-- cargo-reedme: end -->