Crate assemblage_db[][src]

Expand description

Distributed document/graph database for connected & overlapping pages.

AssemblageDB is a transactional high-level database for connected webs of pages, notes, texts and other media. Think of it like a personal web, but easily editable, with more connections and better navigation than the web. It is high-level in the sense that it defines a document model similar to HTML but vastly simpler and with graph-like 2-way links instead of tree-like 1-way jump links. The data model is both:

  • document-oriented: supports nested documents without a fixed schema
  • graph-based: documents can have multiple parents and form a directed, possibly cyclic graph

Features

  • versioned: old values remain accessible until merged
  • transactional: snapshots are isolated through MVCC
  • storage-agnostic: supports native (files) and wasm (IndexedDB) targets
  • indexed: maintains an automatic index for similarity/overlap search
  • distributed: nodes can be published/subscribed as remote broadcasts

Distributed DBs

AssemblageDB is distributed in a sense very similar to a distributed version control system such as git: All content is stored and edited locally without any coordination with other distributed copies, but AssemblageDBs can broadcast parts or all of their content to a cloud service and allow other instances to fetch the content into their local instances. These local “borrowed” copies are only modified when updates are fetched from the cloud service, but never directly edited by anyone but the owner instance, ensuring that no conflicts arise. The connection between borrowed content and owned content is instead constructed implicitly through overlaps, automatic links between textually similar paragraphs.

In other words, AssemblageDBs form an overlapping network of document graphs where each DB can be independently edited by their owner and connections between different DBs are found automatically if their content is similar enough. A single AssemblageDB instance is always consistent, but consistency between different instances is explicitly not a goal. Instead of trying to achieve consensus between different owners, each DB has full control over their own graph of documents, with overlaps between graphs established through textual overlap search.

All content inserted into an AssemblageDB is automatically indexed and fully searchable. The search index is not particularly space-efficient, but general enough to find overlaps between arbitrary sequences of bytes, so that the strings “MyVeryInterestingString” and “MyVeryUninterestingString” would match with a large overlap.

Data model

Nodes in an AssemblageDB can be either atomic (a line of text for example) or nested, either in a list containing multiple children or a styled node containing just a single child. data::Node::List nodes have a layout, which controls how children are laid out in relation to each other, while data::Node::Styled nodes have zero or more block styles or span styles, that control how their (possibly nested) child is displayed. Examples for layouts and styles are:

  • data::Layout::Chain: lays out children as a consecutive chain of inline spans. With 2 text children “foo” and “bar”, the chain would be displayed as “foobar”.
  • data::Layout::Page: lays out children as blocks on a page, separated vertically by a new line. With 2 text children “foo” and “bar”, the page would be displayed as 2 lines, the first line containing “foo”, the second line containing “bar”.
  • data::SpanStyle::Italic: A span (inline) style that would display the child “foo” as “foo
  • data::BlockStyle::Heading: A block style that would display the child “foo” in its own block with a larger font size.

A node is always either a span or a block. Text nodes are considered to be spans by default and remain spans if styled using span styles such as data::SpanStyle::Italic or data::SpanStyle::Bold. However, a single block style (such as data::BlockStyle::Heading) in a set of styles is always “contagious” and turns a text node “foo” styled with both data::SpanStyle::Italic and data::BlockStyle::Heading into a block. Similarly, layouts control whether a list is displayed as a span or a block: data::Layout::Chain turns a list into a span, while data::Layout::Page turns a list into a sequence of blocks.

So, what happens when a span contains a block? Or when a list of blocks is styled using a set of span styles? There are a few rules that govern interactions between spans, blocks and styles:

  • Whenever styles apply to nested children, all styles are applied to all children. A span style such as data::SpanStyle::Italic applied to a list of blocks would thus style each child as italic, a block style such as data::BlockStyle::Heading would style each child as a heading block.
  • Whenever a block occurs inside a span, the block is displayed as a link to the block. These links are always displayed as (inline) spans, so that blocks are never directly displayed inside spans.
  • Whenever a list of blocks occurs as a child of a list of blocks, the child is “unwrapped” and displayed as if the parent list contained all these blocks directly. So if a page A contains the children “A1” and “A2” and another page B contains the children “B1”, the page A and “B2”, then B would be displayed as the blocks “B1”, “A1”, “A2”, “B2”.
  • Whenever a list of spans occurs as a child of a list of spans, the child is similarly “unwrapped” and displayed as if the parent list contained all these spans directly.

Example

use assemblage_db::{Db, Result, data::{BlockStyle, Child, Id, Layout, Node, SpanStyle}, tx};
use assemblage_kv::{run, storage::{self, MemoryStorage}};

fn main() -> Result<()> {
    // The `run!` macro abstracts away the boilerplate of setting up the
    // right async environment and storage for native / wasm and is not
    // needed outside of doc tests.
    run!(async |storage| {
        let db = Db::open(storage).await?;

        // Nodes support layouts and styles, for example as a page of blocks...
        let page1_id = tx!(|db| {
            db.add(Node::list(
                Layout::Page,
                vec![
                    Node::styled(BlockStyle::Heading, Node::text("A Heading!")),
                    Node::text("This is the first paragraph."),
                    Node::text("Unsurprisingly this is the second one..."),
                ],
            ))
            .await?
        });

        // ...or as inline spans that are chained together:
        let page2_id = tx!(|db| {
            db.add(Node::list(
                Layout::Page,
                vec![Node::list(
                    Layout::Chain,
                    vec![
                        Node::text("And this is the "),
                        Node::styled(SpanStyle::Italic, Node::text("last")),
                        Node::text(" paragraph..."),
                    ],
                )],
            ))
            .await?
        });

        // Documents can form a graph, with nodes keeping track of all parents:
        tx!(|db| {
            db.add(Node::list(Layout::Page, vec![page1_id, page1_id]))
                .await?;

            assert_eq!(db.parents(page1_id).await?.len(), 2);
            assert_eq!(db.parents(page2_id).await?.len(), 0);
        });

        // All text is indexed, the DB supports "overlap" similarity search:
        tx!(|db| {
            let paragraph1_id = db.get(page1_id).await?.unwrap().children()[1].id()?;
            let paragraph3_id = db.get(page2_id).await?.unwrap().children()[0].id()?;

            let overlaps_of_p1 = db.overlaps(paragraph1_id).await?;
            assert_eq!(overlaps_of_p1.len(), 1);
            assert_eq!(overlaps_of_p1[0].id, paragraph3_id);
            assert!(overlaps_of_p1[0].score() > 0.5);
        });
        Ok(())
    })
}

Modules

Data structures for published or subscribed broadcasts.

Data structures for AssemblageDB nodes, children, parents and siblings.

Macros

Removes boilerplate and constructs closure-like DB transactions.

Structs

A versioned and transactional document/graph DB.

An isolated snapshot of a DB at a single point in time.

Enums

The error type for DB operations.

The result of a DbSnapshot::preview() call, if successful.

The result of a DbSnapshot::restore() call, if successful.

Type Definitions

A specialized Result type for DB operations.