Skip to main content

Crate serde_cursor

Crate serde_cursor 

Source
Expand description

crates.io docs.rs license msrv github

This crate allows you to declaratively specify how to fetch the desired parts of a serde-compatible data format (such as JSON) efficiently, without loading it all into memory, using a jq-like language.

serde_cursor = "0.4"

§Examples

The Cursor! macro makes it extremely easy to extract nested fields from data.

§Get version from Cargo.toml

use serde_cursor::Cursor;

let data = r#"
    [workspace.package]
    version = "0.1"
"#;

let version: String = toml::from_str::<Cursor!(workspace.package.version)>(data)?.0;
assert_eq!(version, "0.1");

Cursor!(workspace.package.version) is the magic juice - this type-macro expands to a type that implements serde::Deserialize.

Without serde_cursor:

Pain and suffering…

use serde::Deserialize;

#[derive(Deserialize)]
struct CargoToml {
    workspace: Workspace
}

#[derive(Deserialize)]
struct Workspace {
    package: Package
}

#[derive(Deserialize)]
struct Package {
    version: String
}

let data = r#"
    [workspace.package]
    version = "0.1"
"#;

let version = toml::from_str::<CargoToml>(data)?.workspace.package.version;

§Get names of all dependencies from Cargo.lock

The index-all [] accesses every element in an array:

use serde_cursor::Cursor;

let file = r#"
    [[package]]
    name = "serde"

    [[package]]
    name = "rand"
"#;

let packages: Vec<String> = toml::from_str::<Cursor!(package[].name)>(file)?.0;

assert_eq!(packages, vec!["serde", "rand"]);

§Syntax

Specify the type Vec<String> after the path package[].name:

let packages = toml::from_str::<Cursor!(package[].name: Vec<String>)>(file)?.0;

The type can be omitted, in which case it will be inferred:

let packages: Vec<String> = toml::from_str::<Cursor!(package[].name)>(file)?.0;

Fields that consist of identifiers and -s can be used without quotes:

Cursor!(dev-dependencies.serde.version)

Fields that contain spaces or other special characters must be quoted:

Cursor!(ferris."🦀::<>".r#"""#)

You can access specific elements of an array:

Cursor!(package[0].name)

§serde_cursor + monostate = 🧡💛💚💙💜

The monostate crate provides the MustBe! macro, which returns a type that implements serde::Deserialize, and can only ever deserialize from one specific value.

Together, these 2 crates provide an almost jq-like experience of data processing in Rust:

// early exit if the `reason` field is not equal to `"compiler-message"`
get!(reason: MustBe!("compiler-message"))?;
get!(message.message: MustBe!("trace_macro"))?;

Ok(Expansion {
    messages: get!(message.children[].message)?,
    byte_start: get!(message.spans[0].byte_start)?,
    byte_end: get!(message.spans[0].byte_end)?,
})

The jq version of the above processing looks like this:

select(.reason == "compiler-message")
| select(.message.message == "trace_macro")
| {
    messages: [.message.children[].message],
    byte_start: .message.spans[0].byte_start,
    byte_end: .message.spans[0].byte_end
}

The full code for the above example looks like this:

use monostate::MustBe;
use serde_cursor::Cursor;

struct Expansion {
    messages: Vec<String>,
    byte_start: u32,
    byte_end: u32,
}

impl Expansion {
    fn parse(value: &[u8]) -> serde_json::Result<Self> {
        macro_rules! get {
            ($($cursor:tt)*) => {
                serde_json::from_slice::<
                    Cursor!($($cursor)*)
                >(value).map(|it| it.0)
            };
        }

        get!(reason: MustBe!("compiler-message"))?;
        get!(message.message: MustBe!("trace_macro"))?;

        Ok(Expansion {
            messages: get!(message.children[].message)?,
            byte_start: get!(message.spans[0].byte_start)?,
            byte_end: get!(message.spans[0].byte_end)?,
        })
    }
}

For reference, the same logic without serde_cursor or monostate

use serde::Deserialize;

struct Expansion {
    messages: Vec<String>,
    byte_start: u32,
    byte_end: u32,
}

impl Expansion {
    fn from_slice(value: &[u8]) -> serde_json::Result<Self> {
        #[derive(Deserialize)]
        struct RawDiagnostic {
            reason: String,
            message: DiagnosticMessage,
        }

        #[derive(Deserialize)]
        struct DiagnosticMessage {
            message: String,
            children: Vec<DiagnosticChild>,
            spans: Vec<DiagnosticSpan>,
        }

        #[derive(Deserialize)]
        struct DiagnosticChild {
            message: String,
        }

        #[derive(Deserialize)]
        struct DiagnosticSpan {
            byte_start: u32,
            byte_end: u32,
        }

        let raw: RawDiagnostic = serde_json::from_slice(value)?;

        if raw.reason != "compiler-message" || raw.message.message != "trace_macro" {
            return Err(serde::de::Error::custom("..."));
        }

        let primary_span = raw.message.spans.get(0)
            .ok_or_else(|| serde::de::Error::custom("..."))?;

        Ok(Expansion {
            messages: raw.message.children.into_iter().map(|c| c.message).collect(),
            byte_start: primary_span.byte_start,
            byte_end: primary_span.byte_end,
        })
    }
}

§Ranges

Ranges are like [] but for only for elements with an index that falls in the range:

Cursor!(package[4..]);
Cursor!(package[..8]);
Cursor!(package[4..8]);
Cursor!(package[4..=8]);

§Interpolations

It’s not uncommon for multiple queries to get quite repetitive:

let pressure: Vec<f64> = toml::from_str::<Cursor!(france.properties.timeseries[].data.instant.details.air_pressure_at_sea_level)>(france)?.0;
let humidity: Vec<f64> = toml::from_str::<Cursor!(japan.properties.timeseries[].data.instant.details.relative_humidity)>(japan)?.0;
let temperature: Vec<f64> = toml::from_str::<Cursor!(japan.properties.timeseries[].data.instant.details.air_temperature)>(japan)?.0;

serde_cursor supports interpolations. You can factor out a common path into a type Details, and then interpolate it with $Details in the path inside Cursor!:

type Details<RestOfPath> = serde_cursor::Path!(properties.timeseries[].data.instant.details + RestOfPath);

let pressure: Vec<f64> = toml::from_str::<Cursor!(france.$Details.air_pressure_at_sea_level)>(france)?.0;
let humidity: Vec<f64> = toml::from_str::<Cursor!(japan.$Details.relative_humidity)>(japan)?.0;
let temperature: Vec<f64> = toml::from_str::<Cursor!(japan.$Details.air_temperature)>(japan)?.0;

§serde_cursor vs serde_query

serde_query also implements jq-like queries, but more verbosely.

§Single query

serde_cursor:

use serde_cursor::Cursor;

let data = r#"{ "commits": [{"author": "Ferris"}] }"#;

let authors: Vec<String> = serde_json::from_str::<Cursor!(commits[].author)>(data)?.0;

serde_query:

use serde_query::Deserialize;

#[derive(Deserialize)]
struct Data {
    #[query(".commits.[].author")]
    authors: Vec<String>,
}

let data = r#"{ "commits": [{"author": "Ferris"}] }"#;
let data: Data = serde_json::from_str(data)?;

let authors = data.authors;

§Storing queries in a struct

serde_cursor:

use serde::Deserialize;
use serde_cursor::Cursor;

#[derive(Deserialize)]
struct Data {
    #[serde(rename = "commits")]
    authors: Cursor!([].author: Vec<String>),
    count: usize,
}

let data = r#"{ "count": 1, "commits": [{"author": "Ferris"}] }"#;

let data: Data = serde_json::from_str(data)?;

serde_query:

use serde_query::Deserialize;

#[derive(Deserialize)]
struct Data {
    #[query(".commits.[].author")]
    authors: Vec<String>,
    #[query(".count")]
    count: usize,
}

let data = r#"{ "count": 1, "commits": [{"author": "Ferris"}] }"#;

let data: Data = serde_json::from_str(data)?;

§Great error messages

When deserialization fails, you get the exact path of where the failure occurred:

use serde_cursor::Cursor;

let data = serde_json::json!({ "author": { "id": "not-a-number" } });
let result = serde_json::from_value::<Cursor!(author.id: i32)>(data);
let err = result.unwrap_err().to_string();
assert_eq!(err, r#".author.id: invalid type: string "not-a-number", expected i32"#);

§serde_with integration

If feature = "serde_with" is enabled, the type returned by Cursor! will implement serde_with::DeserializeAs and serde_with::SerializeAs, meaning you can use it with the #[serde_as] attribute:

use serde::{Serialize, Deserialize};
use serde_cursor::Cursor;

#[serde_as]
#[derive(Serialize, Deserialize)]
struct CargoToml {
    #[serde(rename = "workspace")]
    #[serde_as(as = "Cursor!(package.version)")]
    version: String,
}

let toml: CargoToml = toml::from_str("workspace = { package = { version = '0.1.0' } }")?;
assert_eq!(toml.version, "0.1.0");
assert_eq!(serde_json::to_string(&toml)?, r#"{"workspace":{"package":{"version":"0.1.0"}}}"#);

§How does it work?

The Cursor! macro expands to a recursive type that implements serde::Deserialize. Information on how to access the nested fields is stored entirely inside the type system.

Consider this query, which gets the first dependency of every dependency in Cargo.toml:

Cursor!(package[].dependencies[0]: String)

For this Cargo.lock, it would extract ["libc", "find-msvc-tools"]:

[[package]]
name = "android_system_properties"
dependencies = ["libc"]

[[package]]
name = "cc"
dependencies = ["find-msvc-tools", "shlex"]

That macro is expanded into a Cursor type, which implements serde::Deserialize and serde::Serialize:

Cursor<
    String, // : String
    Path<
        Field<"package">, // .package
        Path<
            IndexAll, // []
            Path<
                Field<"dependencies">, // .dependencies
                Path<
                    Index<0>, // [0]
                    PathEnd
                >,
            >,
        >,
    >,
>

The above is essentially an equivalent to:

vec![
    Segment::Field("package"), // .package
    Segment::IndexAll, // []
    Segment::Field("dependencies"), // .dependencies
    Segment::Index(0) // [0]
]

Except it exists entirely in the type system.

Each time the serde::Deserialize::deserialize() function is called, the first segment of the path (.package) is processed, and the rest of the path ([].dependencies[0]) is passed to the serde::Deserialize trait, again, and again - until the path is empty.

Once the path is empty, we finally get to the type of the field - the String in the above example, and finally call serde::Deserialize::deserialize() on that, to finish things off - this String is then bubbled up the stack and returned from <Cursor<String, _> as serde::Deserialize>::deserialize.

Modules§

implementation_detailsdoc
Available if you need to implement a trait for the type returned by Cursor!, or implement the Sequence trait to have the index-all [] syntax work with more collections.

Macros§

Cursor
Access nested fields of serde-compatible data formats easily.
Path
Support for interpolations, Cursor!(japan.$Details.air_temperature).