diamond_types/lib.rs
1//! This is a super fast CRDT implemented in rust. It currently only supports plain text documents
2//! but the plan is to support all kinds of data.
3//!
4//! Diamond types is built on top of two core abstractions:
5//!
6//! 1. The [Operation Log](list::OpLog)
7//! 2. [Branches](list::Branch)
8//!
9//! A branch is a copy of the document state at some point in time. The most common & useful way to
10//! use branches is to make a single branch at the most recent version of the document. When more
11//! changes come in, a branch can be moved forward in time by calling [`merge`](list::Branch::merge).
12//!
13//! Branches in diamond types aren't the same as branches in git. They're a lower level construct.
14//! Diamond types doesn't store a list of the active branches in your data set. A branch is much
15//! simplier than that - internally its just a temporary in-memory tuple of
16//! (version, document state).
17//!
18//! Branches can change over time by referencing the *Operation Log* (OpLog). The oplog is an
19//! append-only log of all the changes which have happened to a document over time. The operation
20//! log can be replayed to generate a branch at any point of time within its range.
21//!
22//! For every operation in the oplog we store a few fields:
23//!
24//! - What the change actually is (eg *insert 'hi' at position 20*)
25//! - Parents (A logical clock of *when* an operation happened)
26//! - ID (Agent & Sequence number). The agent can be used to figure out who made the change.
27//!
28//! ## Example
29//!
30//! For local edits to an oplog, just use [`oplog.add_insert`](list::OpLog::add_insert) or
31//! [`oplog.add_delete_without_content`](list::OpLog::add_delete_without_content):
32//!
33//! ```
34//! use diamond_types::list::*;
35//!
36//! let mut oplog = OpLog::new();
37//! let fred = oplog.get_or_create_agent_id("fred");
38//! oplog.add_insert(fred, 0, "abc");
39//! oplog.add_delete_without_content(fred, 1..2); // Delete the 'b'
40//! ```
41//!
42//! There are also other methods like [`oplog.add_insert_at`](list::OpLog::add_insert_at) which
43//! append a change at some specific point in time. This is useful if you want to append a change to
44//! a branch.
45//!
46//! To create a branch from an oplog, use [`Branch::new` methods](list::Branch::new_at_tip):
47//!
48//! ```
49//! use diamond_types::list::*;
50//! let mut oplog = OpLog::new();
51//! // ...
52//! let mut branch = Branch::new_at_tip(&oplog);
53//! // Equivalent to let mut branch = Branch::new_at_local_version(&oplog, oplog.get_local_version());
54//! println!("branch content {}", branch.content().to_string());
55//! ```
56//!
57//! Once a branch has been created, you can merge new changes using [`branch.merge`](list::Branch::merge):
58//!
59//! ```
60//! use diamond_types::list::*;
61//! let mut oplog = OpLog::new();
62//! // ...
63//! let mut branch = Branch::new_at_tip(&oplog);
64//! let george = oplog.get_or_create_agent_id("george");
65//! oplog.add_insert(george, 0, "asdf");
66//! branch.merge(&oplog, oplog.local_version_ref());
67//! ```
68//!
69//! If you aren't using branches, you can use the simplified [`ListCRDT` API](list::ListCRDT). The
70//! ListCRDT struct simply wraps an oplog and a branch together so you don't need to muck about
71//! with manual merging. This API is also slightly faster.
72//!
73//! I'm holding off on adding examples using this API for now because the API is in flux. TODO: Fix!
74//!
75//!
76//! ## Consuming IDs
77//!
78//! The ID of a change is made up of an agent ID (usually an opaque string) and a sequence number.
79//! Each successive change from the same agent will use the next sequence number - eg: (*fred*, 0),
80//! (*fred*, 1), (*fred*, 2), etc.
81//!
82//! But its important to note what constitutes a change! In diamond types, every inserted character
83//! or deleted character increments (consumes) a sequence number. Typing a run of characters one at
84//! a time is indistinguishable from pasting the same run of characters all at once.
85//!
86//! Note that this is a departure from other CRDTs. Automerge does not work this way.
87//!
88//! For example,
89//!
90//! ```
91//! use diamond_types::list::*;
92//! let mut oplog = OpLog::new();
93//! let fred = oplog.get_or_create_agent_id("fred");
94//! oplog.add_insert(fred, 0, "a");
95//! oplog.add_insert(fred, 1, "b");
96//! oplog.add_insert(fred, 2, "c");
97//! ```
98//!
99//! Produces an identical oplog to this:
100//!
101//! ```
102//! use diamond_types::list::*;
103//! let mut oplog = OpLog::new();
104//! let fred = oplog.get_or_create_agent_id("fred");
105//! oplog.add_insert(fred, 0, "abc");
106//! ```
107//!
108//! Diamond types does this by very aggressively run-length encoding everything it can whenever
109//! possible.
110//!
111//! ### Warning: Do not reuse IDs 💣!
112//!
113//! Every ID in diamond types *must be unique*. If two operations are created with the same ID,
114//! peers will only merge one of them - and the document state will diverge. This is really bad!
115//!
116//! Its tempting to reuse agent IDs because they waste disk space. But there's lots of ways to
117//! introduce subtle bugs if you try. Disk space is cheap. Bugs are expensive.
118//!
119//! I recommend instead just generating a new agent ID in every editing session. So, in a text
120//! editor, generate an ID in memory when the user opens the document. Don't save the ID to disk.
121//! Just discard it when the user's editing session ends.
122//!
123//!
124//! ### Aside on atomic transactions
125//!
126//! Grouping changes in atomic blocks is out of the scope of diamond types. But you can implement it
127//! in the code you write which consumes diamond types. Briefly, either:
128//!
129//! 1. Make all the changes you want to make atomically in diamond types, but delay sending those
130//! changes over the network until you're ready, or
131//! 2. Add a special commit message to your network protocol which "commits" marks when a set of
132//! operations in the oplog is safe to merge.
133//!
134//! Diamond types does not (yet) support deleting operations from the oplog. If this matters to you,
135//! please start open an issue about it.
136//!
137//!
138//! ## Parents
139//!
140//! The parents list names the version of the document right before it was changed. An new,
141//! empty document always has the version of *ROOT*. After an operation has happened, the version of
142//! the document is the same as that operation's ID.
143//!
144//! Sometimes changes are concurrent. This can happen in realtime - for example, two users type in a
145//! collaborative document at the same time. Or it can happen asyncronously - for example, two users
146//! edit two different branches, and later merge their results. We can describe what happened with
147//! a *time DAG*, where each change is represented by a node in a DAG (Directed Acyclic Graph).
148//! Edges represent the *directly after* relationship. See [INTERNALS.md](INTERNALS.md) in this
149//! repository for more theoretical information.
150//!
151//! For example, in this time DAG operations `a` and `b` are concurrent:
152//!
153//! ```text
154//! ROOT
155//! / \
156//! a b
157//! \ /
158//! c
159//! ```
160//!
161//! Concurrent changes have some repercussions for the oplog:
162//!
163//! - The order of changes in the oplog isn't canonical. Other peers may have oplogs with a
164//! different order. This is fine. DT uses "local time" numbers heavily internally - which refer to
165//! the local index of a change, as if it were stored in an array. But these indexes cannot be
166//! shared with other peers. However, the order of changes must always obey the partial order of
167//! chronology. If operation A happened before operation B, they must maintain that relative order
168//! in the oplog. In the diagram above, the operations could be stored in the order `[a, b, c]` or
169//! `[b, a, c]` but not `[a, c, b]` because `c` comes after both `a` and `b`.
170//! - We represent a point in time in the oplog using a *list* of (agent, seq) pairs. This list
171//! usually only contains one entry - which is the ID of the preceding operation. But sometimes
172//! we need to merge two threads of history together. In this case, the parents list names all
173//! immediate predecessors. In the diagram above, operation `c` has a parents list of both `a` and
174//! `b`.
175//!
176//! Unlike git (and some other CRDTs), diamond types represents merges *implicitly*. We don't create
177//! a special node in the time DAG for merges. Merges simply happen whenever an operation has
178//! multiple parents.
179
180#![allow(clippy::module_inception)]
181
182extern crate core;
183
184use smallvec::SmallVec;
185use smartstring::alias::String as SmartString;
186use crate::dtrange::DTRange;
187use crate::rle::{KVPair, RleVec};
188
189pub mod list;
190mod rle;
191mod dtrange;
192mod unicount;
193mod remotespan;
194mod rev_range;
195mod history;
196mod frontier;
197mod history_tools;
198
199pub type AgentId = u32;
200const ROOT_AGENT: AgentId = AgentId::MAX;
201const ROOT_TIME: usize = usize::MAX;
202
203// TODO: Consider changing this to u64 to add support for very long lived documents even on 32 bit
204// systems.
205pub type Time = usize;
206
207/// A LocalVersion is a set of local Time values which point at the set of changes with no children
208/// at this point in time. When there's a single writer this will
209/// always just be the last order we've seen.
210///
211/// This is never empty.
212///
213/// At the start of time (when there are no changes), LocalVersion is usize::max (which is the root
214/// order).
215pub type LocalVersion = SmallVec<[Time; 2]>;
216
217#[derive(Clone, Debug)]
218struct ClientData {
219 /// Used to map from client's name / hash to its numerical ID.
220 name: SmartString,
221
222 /// This is a packed RLE in-order list of all operations from this client.
223 ///
224 /// Each entry in this list is grounded at the client's sequence number and maps to the span of
225 /// local time entries.
226 ///
227 /// A single agent ID might be used to modify multiple concurrent branches. Because of this, and
228 /// the propensity of diamond types to reorder operations for performance, the
229 /// time spans here will *almost* always (but not always) be monotonically increasing. Eg, they
230 /// might be ordered as (0, 2, 1). This will only happen when changes are concurrent. The order
231 /// of time spans must always obey the partial order of changes. But it will not necessarily
232 /// agree with the order amongst time spans.
233 item_times: RleVec<KVPair<DTRange>>,
234}