Crate dson

Source
Expand description

§DSON: A Delta-State CRDT for JSON-like Data Structures

This crate provides a Rust implementation of DSON, a space-efficient, delta-state Conflict-Free Replicated Datatype (CRDT) for JSON-like data structures. It is based on the research paper “DSON: JSON CRDT Using Delta-Mutations For Document Stores” and inspired by the original author’s JavaScript implementation.

The primary goal of this library is to enable robust, and efficient multi-writer collaboration in extremely constrained environments (high latency and low bandwidth; opportunistic networking).

Unlike other CRDT libraries that expose a single “Document” type, DSON provides a set of composable primitives. This allows you to build the exact data structure you need. The most common top-level structure is an OrMap, which can contain other CRDTs, enabling nested, JSON-like objects. The entire state is typically wrapped in a CausalDotStore, which tracks the causal history.

§Core Concepts

DSON provides three fundamental, composable CRDTs:

  • OrMap: An Observed-Remove Map, mapping arbitrary keys to other CRDT values.
  • OrArray: An Observed-Remove Array, providing a list-like structure.
  • MvReg: A Multi-Value Register, for storing primitive values. When concurrent writes occur, the register holds all conflicting values. This is the only CRDT in this library that can represent value conflicts.

These primitives can be nested to create arbitrarily complex data structures, such as a map containing an array of other maps.

All modifications produce a delta. Instead of sending the entire state after each change, only this small delta needs to be transmitted to other replicas.

§Observed-Remove Semantics

DSON uses Observed-Remove (OR) semantics for its collections. This means an element can only be removed if its addition has been observed. If an element is updated concurrently with its removal, the update “wins,” and the element is preserved. OR-semantics are intuitive, and this is often the desired behavior. Consider a collaborative shopping list:

  1. Initial State: Both Alice and Bob see ["apples", "bananas"].
  2. Alice’s Action: Alice updates “bananas” to “blueberries”.
  3. Bob’s Action: Concurrently, Bob removes “bananas”.

With OR-semantics, the final list will be ["apples", "blueberries"]. Bob’s removal is overridden by Alice’s concurrent update because the update implies the continued existence of the item.

DSON can be extended with special CRDTs providing different semantics for specific use cases though.

§Causal CRDTs and Tombstone-Free Removals

DSON is a causal CRDT, meaning it uses causal history to resolve conflicts. This history is tracked in a CausalContext, which contains a set of “dots”—unique identifiers for every operation.

§Dots

A dot is a globally unique identifier for an operation (for example, adding or updating a value). It is the fundamental unit for tracking causality.

A Dot is a tuple (Identifier, Sequence):

  • Identifier: A unique ID for the actor (a specific application instance on a specific node) that performed the operation. It is composed of a NodeId and an ApplicationId. This structure allows multiple applications on the same machine to collaborate without their histories conflicting.
  • Sequence: A monotonically increasing number (effectively a Lamport timestamp) that is unique to that actor.

When a replica makes a change, it generates a new dot. This dot is broadcast to other replicas along with the delta describing the change.

The collection of all dots a replica has observed forms its CausalContext. This context represents the replica’s knowledge of the document’s history. By comparing its local CausalContext with the context from a received delta, a replica can determine which operations are new, which are concurrent, and which have already been seen. This allows DSON to merge changes correctly and guarantee convergence.

A key advantage of this model is the elimination of tombstones. In many other CRDTs, when an item is deleted, a “tombstone” marker is left behind to signify its removal. These tombstones are never garbage-collected and can cause unbounded metadata growth in long-lived documents.

DSON avoids this growth by tracking which operations are “live”. A removal is simply the absence of an operation’s dot from the causal context. When replicas sync, they can determine which items have been deleted by comparing their causal contexts, without needing explicit tombstone markers. This ensures that the metadata size remains proportional to the size of the live data, not the entire history of operations.

§Scope of this Crate

This crate provides the core data structures and algorithms for DSON. It is responsible for generating deltas from mutations and merging them to ensure eventual consistency. It is up to you to build your document structure by composing the provided CRDT primitives, most commonly by wrapping an OrMap in a CausalDotStore.

Note that this is a low-level library. You will likely want to build a typed abstraction layer on top of dson rather than use it directly in your application code.

It does not include any networking protocols.

You are responsible for implementing the transport layer to broadcast deltas to other replicas. The correctness of this library, particularly its causal consistency guarantees, relies on the transport layer delivering deltas in an order that respects the causal history of events. This is typically achieved with an anti-entropy algorithm that exchanges deltas and their causal metadata (CausalContext).

§Getting Started: A Simple Conflict

This section models a simple scenario where two users, Alice and Bob, concurrently edit the same piece of data, leading to a conflict that DSON resolves gracefully.

This example uses a top-level OrMap keyed by a string, holding a MvReg (Multi-Value Register) for a string value. While OrMap is generic over its key, it’s typically a String to model JSON-like objects.

use dson::{
    crdts::{mvreg::MvRegValue, snapshot::ToValue, NoExtensionTypes},
    CausalDotStore, Identifier, OrMap,
};

// 1. SETUP: TWO REPLICAS
// Create two replicas, Alice and Bob, each with a unique ID.
let alice_id = Identifier::new(0, 0);
let mut alice_store = CausalDotStore::<OrMap<String>>::default();

let bob_id = Identifier::new(1, 0);
let mut bob_store = CausalDotStore::<OrMap<String>>::default();

// 2. INITIAL STATE
// Alice creates an initial value. The key "123" holds a register.
let key = "123".to_string();
let delta_from_alice = dson::api::map::apply_to_register::<_, NoExtensionTypes, _>(
    |reg, ctx, id| reg.write("initial value".to_string().into(), ctx, id),
    key.clone(),
)(&alice_store.store, &alice_store.context, alice_id);

// Alice applies the change to her own store.
alice_store.join_or_replace_with(delta_from_alice.store.clone(), &delta_from_alice.context);

// 3. SYNC
// Bob receives Alice's initial change.
bob_store.join_or_replace_with(delta_from_alice.store, &delta_from_alice.context);
assert_eq!(alice_store, bob_store);

// 4. CONCURRENT EDITS
// Now, Alice and Bob make changes without syncing.

// Alice updates the value to "from Alice".
let delta_alice_edit = dson::api::map::apply_to_register::<_, NoExtensionTypes, _>(
    |reg, ctx, id| reg.write("from Alice".to_string().into(), ctx, id),
    key.clone(),
)(&alice_store.store, &alice_store.context, alice_id);
alice_store.join_or_replace_with(delta_alice_edit.store.clone(), &delta_alice_edit.context);

// Concurrently, Bob updates the value to "from Bob".
let delta_bob_edit = dson::api::map::apply_to_register::<_, NoExtensionTypes, _>(
    |reg, ctx, id| reg.write("from Bob".to_string().into(), ctx, id),
    key.clone(),
)(&bob_store.store, &bob_store.context, bob_id);
bob_store.join_or_replace_with(delta_bob_edit.store.clone(), &delta_bob_edit.context);

// 5. MERGE
// The replicas now exchange their changes.

// Alice merges Bob's edit.
alice_store.join_or_replace_with(delta_bob_edit.store, &delta_bob_edit.context);

// Bob merges Alice's edit.
bob_store.join_or_replace_with(delta_alice_edit.store, &delta_alice_edit.context);

// After merging, both stores are identical.
assert_eq!(alice_store, bob_store);

// 6. VERIFY CONFLICT
// The concurrent writes are preserved as a conflict in the MvReg.
let register_value = alice_store.store.get(&key).unwrap();
let values: Vec<_> = register_value.reg.values().into_iter().collect();

assert_eq!(values.len(), 2);
assert!(values.contains(&&MvRegValue::String("from Alice".to_string())));
assert!(values.contains(&&MvRegValue::String("from Bob".to_string())));

§Advanced Topics

§The Extension System

DSON includes an extension system that allows developers to define custom CRDTs by implementing the ExtensionType trait. This is for building domain-specific data structures that go beyond the standard JSON-like primitives.

By implementing the ExtensionType trait, you define how your custom type should be serialized, deserialized, and merged. The system handles conflict resolution based on the rules you define.

This can be used to implement custom data structures like counters, text objects, or more efficient state representation.

§Validation and Observation

DSON provides a Sentinel trait that allows you to observe or validate changes as they are applied during a merge. This can be used for implementing authorization, logging, or triggering side effects.

§Network and Consistency

DSON’s delta-based approach minimizes the amount of data that needs to be transmitted between replicas, making it efficient for low-bandwidth or high-latency networks.

However, much of the complexity of using DSON in practice lies in the correct design and implementation of the gossip protocol used to exchange deltas between replicas. An efficient gossip protocol is not trivial to implement. For guidance, refer to the research on opportunistic networking (oppnet).

It is also important to understand that DSON’s causal consistency guarantees are provided on a per-register basis. This means that while individual values are guaranteed to be causally consistent, the relationships between different values are not. This can lead to very unintuitive behavior. For example, if you have two registers, x and y, you write to x and then to y, another replica might see the write to y before the write to x.

§License

This project is licensed under either of

  • Apache License, Version 2.0, (LICENSE-APACHE or http://www.apache.org/licenses/LICENSE-2.0)
  • MIT license (LICENSE-MIT or http://opensource.org/licenses/MIT)

at your option.

§Features

  • json: Enables serialization and deserialization of DSON documents to and from serde_json::Value. This feature is enabled by default.
  • serde: Provides serde support for all CRDT types.
  • arbitrary: Implements quickcheck::Arbitrary for CRDT types, useful for property-based testing.
  • chrono: Enables chrono support for Timestamp. This feature is enabled by default.
  • ulid: Enables registers to hold ulids. This feature is enabled by default.

Re-exports§

pub use causal_context::CausalContext;
pub use causal_context::Dot;
pub use causal_context::Identifier;
pub use causal_context::MAX_APPLICATION_ID;
pub use causal_context::NodeId;
pub use causal_context::Priority;
pub use causal_context::ROOT_APP_ID;
pub use crdts::mvreg::MvReg;
pub use crdts::orarray::OrArray;
pub use crdts::ormap::OrMap;
pub use chrono;

Modules§

api
causal_context
Causal Context
crdts
Composable CRDTs for JSON-like Data
datetime_literal
macros
Macros usable for tests and initialization
sentinel
Observe and validate changes to a CRDT.

Macros§

causal_context
This macro parses the debug-representation of a CausalContext, giving an easy way to instantiate a cc with a given set of dots:
crdt_literal
Convenience macro for creating a TypeVariantValue, of either map, array or register type.
crdt_map_literal
Convenience macro for creating a OrMap instance.
crdt_map_store
datetime
Declarative macro to create a chrono::DateTime<chrono::Utc> suitable for const evaluation, as this is otherwise cumbersome.
dot
Convenience macro for creating dot values.

Structs§

CausalDotStore
A DotStore paired with a CausalContext.
ComputeDeletionsArg
DotFun
A map from Dot to V whose computed dots is its keyset.
DotFunMap
A map from Dot to V: DotStore, whose computed dots is the union of the dots of its values.
DotMap
A map from an arbitrary key type to a V: DotStore, whose computed dots is the union of the dots of its values.
DryJoinOutput
The outcome of performing a dry-join.
DsonRandomState
This is a small wrapper around the standard RandomState. This allows us to easily switch to a non-random RandomState for use in tests.

Enums§

DotChange
An observed change to a dot store.

Traits§

DotStore
A container for data-type specific information that stores the state of a 𝛿-based CRDT.
DotStoreJoin
A trait for dot stores that can be joined.
ExtensionType
A type that extends TypeVariantValue and friends with additional value types.

Functions§

compute_deletions_unknown_to
Returns dots that known_dots has deleted (by virtue of not being in live_dots) that are still present in ignorant.