serde_cursor 0.1.3

fetch the desired parts of a serde-compatible data format efficiently using a jq-like language
Documentation

serde_cursor

crates.io docs.rs license msrv github

This crate allows you to declaratively specify how to fetch the desired parts of a serde-compatible data format efficiently, without loading it all into memory, using a jq-like language.

serde_cursor = "0.1"

Examples

The Cursor! macro makes it extremely easy to extract nested fields from data.

Get version from Cargo.toml

# Cargo.toml
[workspace.package]
version = "0.1"

Accessed with workspace.package.version:

use serde_cursor::Cursor;

let data = fs::read_to_string("Cargo.toml")?;

let version: String = toml::from_str::<Cursor!(workspace.package.version)>(&data)?.0;
assert_eq!(version, "0.1");

Cursor!(workspace.package.version) is the magic juice - this type-macro expands to a type that implements Deserialize.

Without serde_cursor:

Pain and suffering…

use serde::Deserialize;

#[derive(Deserialize)]
struct CargoToml {
    workspace: Workspace
}

#[derive(Deserialize)]
struct Workspace {
    package: Package
}

#[derive(Deserialize)]
struct Package {
    version: String
}

let data = fs::read_to_string("Cargo.toml")?;

let version = toml::from_str::<CargoToml>(&data)?.workspace.package.version;

Get names of all dependencies from Cargo.lock

[[package]]
serde = "1.0"

[[package]]
rand = "0.9"

The wildcard .* accesses every element in an array:

use serde_cursor::Cursor;

let file = fs::read_to_string("Cargo.lock")?;

let packages: Vec<String> = toml::from_str::<Cursor!(package.*.name)>(&file)?.0;

assert_eq!(packages, vec!["serde", "rand"]);

Without serde_cursor:

use serde::Deserialize;

#[derive(Deserialize)]
struct CargoLock {
    package: Vec<Package>
}

#[derive(Deserialize)]
struct Package {
    name: String
}

let file = fs::read_to_string("Cargo.lock")?;

let packages = toml::from_str::<CargoLock>(&file)?
    .package
    .into_iter()
    .map(|pkg| pkg.name)
    .collect::<Vec<_>>();

serde_cursor vs serde_query

serde_query also implements jq-like queries, but more verbosely.

Single query

serde_cursor:

use serde_cursor::Cursor;

let data = fs::read_to_string("data.json")?;

let authors: Vec<String> = serde_json::from_str::<Cursor!(commits.*.author)>(&data)?.0;

serde_query:

use serde_query::Deserialize;

#[derive(Deserialize)]
struct Data {
    #[query(".commits.[].author")]
    authors: Vec<String>,
}

let data = fs::read_to_string("data.json")?;
let data: Data = serde_json::from_str(&data)?;

let authors = data.authors;

Storing queries in a struct

serde_cursor:

use serde::Deserialize;
use serde_cursor::Cursor;

#[derive(Deserialize)]
struct Data {
    #[serde(rename = "commits")]
    authors: Cursor!(*.author: Vec<String>),
    count: usize,
}

let data = fs::read_to_string("data.json")?;

let data: Data = serde_json::from_str(&data)?;

serde_query:

use serde_query::Deserialize;

#[derive(Deserialize)]
struct Data {
    #[query(".commits.[].author")]
    authors: Vec<String>,
    #[query(".count")]
    count: usize,
}

let data = fs::read_to_string("data.json")?;

let data: Data = serde_json::from_str(&data)?;

serde_with integration

If feature = "serde_with" is enabled, Cursor will implement serde_with::DeserializeAs and serde_with::SerializeAs, meaning you can use it with the #[serde_as] attribute:

use serde::{Serialize, Deserialize};
use serde_cursor::Cursor;

#[serde_as]
#[derive(Serialize, Deserialize)]
struct CargoToml {
    #[serde(rename = "workspace")]
    #[serde_as(as = "Cursor!(package.version)")]
    version: String,
}

let toml: CargoToml = toml::from_str("workspace = { package = { version = '0.1.0' } }")?;
assert_eq!(toml.version, "0.1.0");
assert_eq!(serde_json::to_string(&toml)?, r#"{"workspace":{"package":{"version":"0.1.0"}}}"#);

Great error messages

When deserialization fails, you get the exact path of where the failure occurred.

use serde_cursor::Cursor;

let data = serde_json::json!({ "author": { "id": "not-a-number" } });
let result = serde_json::from_value::<Cursor!(author.id: i32)>(data);
let err = result.unwrap_err().to_string();
assert_eq!(err, r#".author.id: invalid type: string "not-a-number", expected i32"#);

How does it work?

The Cursor! macro is a “type-level” parser. It takes your jq-like query and transforms it into a nested, recursive type that implements serde::Deserialize.

Consider this query, which gets the first dependency of every dependency in Cargo.toml:

Cursor!(package.*.dependencies.0: String)

For this Cargo.lock, it would extract ["libc", "find-msvc-tools"]:

[[package]]
name = "android_system_properties"
dependencies = ["libc"]

[[package]]
name = "cc"
dependencies = ["find-msvc-tools", "shlex"]

That macro is expanded into a Cursor type, which implements Deserialize and Serialize:

Cursor<
    String,
    Cons<
        Field<"package">,
        Cons<
            Wildcard,
            Cons<
                Field<"dependencies">,
                Cons<Index<0>, Nil>,
            >,
        >,
    >,
>

The above is essentially an equivalent to:

vec!["package", *, "dependencies", 0]

Except it exists entirely in the type system.

Each time the Deserialize::deserialize() function is called, the first element of the type-level list is removed, and the rest of the list is passed to the Deserialize trait, again.

This happens until the list is exhausted, in which case we finally get to the type of the field - the String in the above example, and finally call Deserialize::deserialize() on that, to finish things off.