unstructured 0.2.0-pre1

Generic types for unstructured data

Crate Documentation MIT License

Unstructured Data in Rust

This library provides types for usage with unstructured data. This is based on functionality from both serde_json and serde_value. Depending on your use case, it may make sense to use one of those instead.

These structures for serialization and deserialization into an intermediate container with serde and manipulation of this data while in this intermediate state.


So why not use one of the above libraries?

  • serde_json::value::Value is coupled with JSON serialization/deserialization pretty strongly. The purpose is to have an intermediate format for usage specifically with JSON. This can be a problem if you need something more generic (e.g. you need to support features that JSON does not) or do not wish to require dependence on JSON libraries. Document supports serialization to/from JSON without being limited to usage with JSON libraries.
  • serde_value::Value provides an intermediate format for serialization and deserialization like Document, however it does not provide as many options for manipulating the data such as indexing and easy type conversion.

Example Usage

The primary struct used in this repo is Document. Document provides methods for easy type conversion and manipulation.

use unstructured::Document;
use std::collections::BTreeMap;

let mut map = BTreeMap::new(); // Will be inferred as BTreeMap<Document, Document> though root element can be any supported type
map.insert("test".into(), (100 as u64).into()); // From<> is implement for most basic data types
let doc: Document = map.into(); // Create a new Document where the root element is the map defined above
assert_eq!(doc["test"], Document::U64(100));

Document implements serialize and deserialize so that it can be easily used where the data format is unknown and manipulated after it has been received.

extern crate serde_derive;
use unstructured::Document;

#[derive(Deserialize, Serialize)]
struct SomeStruct {
    key: String,

fn main() {
    let from_service = "{\"key\": \"value\"}";
    let doc: Document = serde_json::from_str(from_service).unwrap();
    assert_eq!(doc["key"], "value".into());

    let some_struct: SomeStruct = doc.try_into().unwrap();
    assert_eq!(some_struct.key, "value");

    let another_doc = Document::new(some_struct).unwrap();
    assert_eq!(another_doc["key"], "value".into());

JSON Pointer syntax can be used as well to quickly get a nested value. This will work regardless of the format that you deserialized from, so this syntax can be used to easily retrieve, for example, nested YAML values.

use unstructured::Document;

let doc: Document =
    serde_json::from_str("{\"some\": {\"nested\": {\"value\": \"is this value\"}}}").unwrap();
let doc_element = doc.pointer("/some/nested/value").unwrap(); // Returns an Option<Document>, None if not found
assert_eq!(*doc_element, "is this value".into());

Below are the Document enum types available:

use std::collections::BTreeMap;
pub enum Document {
    // Boolean

    // Unsigned

    // Signed

    // Floats

    // Char/String
    // Effectively 'Null'
    // Options
    // Newtypes
    // Arrays
    // Maps
    Map(BTreeMap<Document, Document>),
    // Raw data