unstructured/lib.rs
1/*!
2This library provides types for usage with unstructured data. This is based on functionality from both
3[serde_json](https://github.com/serde-rs/json) and [serde_value](https://github.com/arcnmx/serde-value). Depending
4on your use case, it may make sense to use one of those instead.
5
6These structures for serialization and deserialization into an intermediate container with serde and manipulation
7of this data while in this intermediate state.
8
9# Purpose
10
11So why not use one of the above libraries?
12
13- **serde_json::value::Value** is coupled with JSON serialization/deserialization pretty strongly. The purpose is to have
14 an intermediate format for usage specifically with JSON. This can be a problem if you need something more generic (e.g.
15 you need to support features that JSON does not) or do not wish to require dependence on JSON libraries. Document supports
16 serialization to/from JSON without being limited to usage with JSON libraries.
17- **serde_value::Value** provides an intermediate format for serialization and deserialization like Document, however it does
18 not provide as many options for manipulating the data such as indexing and easy type conversion.
19
20In addition to many of the futures provided by the above libraries, unstructured also provides:
21
22- Easy usage of comparisons with primitive types, e.g. ```Unstructured<T>::U64(100) == 100 as u64```
23- Easy merging of multiple documents: ```doc1.merge(doc2)``` or ```doc = doc1 + doc2```
24- Selectors for retrieving nested values within a document without cloning: ```doc.select(".path.to.key")```
25- Filters to create new documents from an array of input documents: ```docs.filter("[0].path.to.key | [1].path.to.array[0:5]")```
26- Convenience methods for is_type(), as_type(), take_type()
27- Most of the From implementation for easy document creation
28
29# Example Usage
30
31The primary struct used in this repo is ```Document```. Document provides methods for easy type conversion and manipulation.
32
33```
34use unstructured::{Document, Number};
35use std::collections::BTreeMap;
36
37let mut map = BTreeMap::new(); // Will be inferred as BTreeMap<Document, Document> though root element can be any supported type
38map.insert("test".into(), 100u64.into()); // From<> is implement for most basic data types
39let doc: Document = map.into(); // Create a new Document where the root element is the map defined above
40assert_eq!(doc["test"], Document::Number(Number::U64(100)));
41```
42
43Document implements serialize and deserialize so that it can be easily used where the data format is unknown and manipulated
44after it has been received.
45
46```
47#[macro_use]
48extern crate serde;
49use unstructured::Document;
50
51#[derive(Deserialize, Serialize)]
52struct SomeStruct {
53 key: String,
54}
55
56fn main() {
57 let from_service = "{\"key\": \"value\"}";
58 let doc: Document = serde_json::from_str(from_service).unwrap();
59 let expected: Document = "value".into();
60 assert_eq!(doc["key"], expected);
61
62 let some_struct: SomeStruct = doc.try_into().unwrap();
63 assert_eq!(some_struct.key, "value");
64
65 let another_doc = Document::new(some_struct).unwrap();
66 assert_eq!(another_doc["key"], expected);
67}
68```
69
70Selectors can be used to retrieve a reference to nested values, regardless of the incoming format.
71
72- [JSON Pointer syntax](https://tools.ietf.org/html/rfc6901): ```doc.select("/path/to/key")```
73- A JQ inspired syntax: ```doc.select(".path.to.[\"key\"")```
74
75```
76use unstructured::Document;
77
78let doc: Document =
79 serde_json::from_str("{\"some\": {\"nested\": {\"value\": \"is this value\"}}}").unwrap();
80let doc_element = doc.select("/some/nested/value").unwrap(); // Returns an Option<Document>, None if not found
81let expected: Document = "is this value".into();
82assert_eq!(*doc_element, expected);
83```
84
85In addition to selectors, filters can be used to create new documents from an array of input documents.
86
87- Document selection: ```"[0]", "[1]", "*"```
88- Path navigation: ```"[0].path.to.key" "[0] /path/to/key" r#" [0] .["path"].["to"].["key"] "#```
89- Index selection: ```"[0] .array.[0]"```
90- Sequence selection: ```"[0] .array.[0:0]" "[0] .array.[:]" "[0] .array.[:5]"```
91- Filtering multiple docs: ```"[0].key | [1].key"```
92- Merging docs: ```"*" "[0].key.to.merge | [1].add.this.key.too | [2].key.to.merge"```
93
94```
95use unstructured::{Document, Number};
96
97let docs: Vec<Document> = vec![
98 serde_json::from_str(r#"{"some": {"nested": {"vals": [1,2,3]}}}"#).unwrap(),
99 serde_json::from_str(r#"{"some": {"nested": {"vals": [4,5,6]}}}"#).unwrap(),
100];
101let result = Document::filter(&docs, "[0].some.nested.vals | [1].some.nested.vals").unwrap();
102assert_eq!(result["some"]["nested"]["vals"][4], Document::Number(Number::U64(5)));
103```
104*/
105
106#[macro_use]
107extern crate serde;
108
109#[cfg(test)]
110mod test;
111
112pub use number::*;
113pub use crate::core::*;
114
115mod selector;
116mod core;
117mod macros;
118mod number;