jujube 0.1.1

Jujube (an experimental VCS)
Documentation
# Jujube


## Disclaimer

This is not a Google product. It is an experimental version-control system
(VCS). It is not ready for use. It was written by me, Martin von Zweigbergk
(martinvonz@google.com). It is my personal hobby project. It does not indicate
any commitment or direction from Google.


## Introduction

I started the project mostly in order to test the viability of some UX ideas in
practice. I continue to use it for that, but my short-term goal now is to make
it useful as an alternative CLI for Git repos.

The storage design is similar to Git's in that it stores commits, trees, and
blobs. However, the blobs are actually split into three types: normal files,
symlinks (Unicode paths), and conflicts (more about that later).

The command-line tool is called `jj` for now because it's easy to type and easy
to replace (rare in English). The project is called "Jujube" (a fruit) because
that's the first word I could think of that matched "jj".


## Features

The following subsections describe the current features. The text is aimed at
readers who are already familiar with other VCSs.

### Compatible with Git

The tool currently has two backends. One is called "local store" and is very
simple and inefficient. The other backend uses a Git repo as storage. The
commits are stored as regular Git commits. Commits can be read from and written
to an existing Git repo. This makes it possible to create a Jujube repo and use
it as an alternative interface for a Git repo (it will be backed by the Git repo
just like additional Git worktrees are).

### Written as a library

The repo consists of two main parts: the lib crate and the main (CLI)
crate. Most of the code lives in the lib crate. The lib crate does not print
anything to the terminal. The separate lib crate should make it relatively
straight-forward to add a GUI.


### Operations are performed repo-first

Almost all operations are done in the repo first and then possibly reflected in
the working copy. The only exception so far is when committing the working copy,
which naturally uses the working copy as input.

This makes it faster because the working copy doesn't need to get updated. It
also means that the working copy won't see spurious changes e.g. during a rebase
operation. It makes it safe to update the working copy while some operation is
running.

### Supports Evolution

Jujube copies the Evolution feature from Mercurial. It keeps track of when a
commit gets rewritten. A commit has a list of predecessors in addition to the
usual list of parents. This lets the tool figure out where to rebase descendant
commits to when a commit has been rewritten (amended, rebased, etc.). See
https://www.mercurial-scm.org/wiki/ChangesetEvolution for more information.

### The working copy is a commit

The working copy gets automatically committed when you interact with the
tool. This simplifies both implementation and UX. It also means that the working
copy is frequently backed up.

Any changes to the working copy stays in place when you check out another
commit. That is different from Git and Mercurial, but I think it's more
intuitive for new users. To replicate the default behavior of Git/Mercurial, use
`jj rebase -r @ -d <destination>` (`@` is a name for the working copy
commit). There is no need to stash/unstash.

Commands become more consistent because the same command can operate on the repo
or another commit. For example, `jj log` includes the working copy (much like
`gitk` and other tools include a node for the working copy). `jj squash`
squashes a commit into its parent, including if it's the working copy (like `git
commit --amend`/`hg amend`).

A commit description can be added to the working copy before "commit". The same
command (`jj describe`) is used for changing the description of any commit.

### Commits can contains conflicts

When a merge conflict happens, it is recorded within the tree object as a
special conflict object (not a file object with conflict markers). Conflicts are
stored as a lists of states to add and another list of states to remove. A
regular 3-way merge adds [B,C] and removes [A] (the common ancestor). A
modify/remove conflict adds [B] and removes [A]. An add/add conflict adds
[B,C]. An octopus merge of N commits adds N states and removes N-1 states. A
non-conflict state A is equivalent to a conflict state that just adds [A]. A
"state" here can be a normal file, a symlink, or a tree. This support for
in-tree conflicts has some interesting effects on both implementation and UX.

It means that there is a consistent way of resolving conflicts: check out a
commit with conflicts in, resolve the conflicts, and amend them into the
conflicted commit. Then evolve descendant commits.

It naturally enables collaborative conflict resolution.

The in-tree conflicts means that there is no need for book-keeping in
rebase-like commands to support continue/abort operations. Instead, the rebase
can simply continue and create the desired new DAG shape.

Conflicts get simplified on rebase by removing pairs of matching states in the
"add" and "remove" lists. For example, if B is based on A and then rebased to C,
and then to D, it will be a regular 3-way merge between B, and D with C as base
(no trace of A). This means that you can keep old commits rebased to head
without resolving conflicts, and you still won't have messy recursive conflicts.

The conflict handling also results in some Darcs-/Pijul-like properties. For
example, if you rebase a commit and it results in conflicts, and you then back
out that commit, the conflict will go away. (I plan to make that work even if
there had been unrelated changes in the file, but I haven't gotten around to it
yet.)

The criss-cross merge case becomes simpler. In Git, the virtual ancestor may
have conflicts and you may get nested conflict markers in the working copy. In
Jujube, the result is a merge with multiple parts, which may even get simplified
to not be recursive.

The in-tree conflicts make it natural and easy to define the contents of a merge
commit to be the difference compared to the merged parents (the so-called "evil"
part of the merge), so that's what Jujube does. Rebasing merge commits therefore
works as you would expect (Git and Mercurial both handle rebasing of merge
commits poorly). It's even possible to change the number of parents while
rebasing, so if A is non-merge commit, you can make it a merge commit with `jj
rebase -r A -d B -d C`. `jj diff -r <commit>` will show you the diff compared to
the merged parents.

I intend for commands that present the contents of a tree (such as listing
files) to use the "add" state(s) of the conflict, but that's not yet done.

### Operations are logged

Each write operation is logged to a content-addressed storage, much like the
commit storage. The Operation object has an associated View object, much like
the Commit object has a Tree object. The view object contains all the heads
currently in the repo, as well as the checked-out commit. It will also contain
the refs if I add support for that. The operation object can have multiple
parent operations, so it forms a DAG just like the commit graph does. There is
normally only one parent operation, but there can be multiple parents if
concurrent operations happened.

I added the operation log as a solution for the problem of making concurrent
repo edit safe. When the repo is loaded, it is loaded at a particular operation,
which provides an immutable view of the repo. For a caller of the library to
start making changes, they then have to start a transaction. Once they are done
making changes to the transaction, they commit. The operation object is then
created. This step cannot fail (except if the file system runs out of space or
such). Pointers to the heads of the operation DAG are kept as files in a
directory (the filename is the operation id). When a new operation object has
been created, its operation id is added to the directory. The transaction's base
operation id is then removed from that directory. If concurrent operations
happened, there would be multiple new operation ids in the directory and only
one base operation id would have been removed. If a reader sees the repo in this
state, it will attempt to merge the views and create a new operation with
multiple parents. If there are conflicts, the user will have to resolve it (I
haven't implemented that yet).

As a nice side-effect of adding the operation log to solve the concurrent-edits
problem, we get some very useful UX features. Many UX features come from mapping
commands that work on the commit graph onto the operation graph. For example, if
you map `git revert`/`hg backout` onto the operation graph, you get an operation
that undoes a previous operation (called `jj op undo`). Note that any operation
can be undone, not just the latest one. If you map `git restore`/`hg revert`
onto the operation graph, you get an operation that rewinds the repo state to an
earlier point (called `jj op restore`).

You can also see what the repo looked like at an earlier point with `jj
--at-op=<operation id> log`. As mentioned earlier, the checkout is also part of
the view, so that command will show you where the working copy was at that
operation. If you do `jj op restore -o <operation id>`, it will also update the
working copy accordingly. This is actually how the working copy is always
updated: we first commit a transaction with a pointer to the new checkout and
then the working copy is updated to reflect that.

## Future plans

TODO