stromatekt 0.0.0

A parallelized OCI builder, on top of runc and btrfs
A re-imagined OCI image builder.

-  Take advantage of native snapshot/diff/overlay functionality of filesystems.
   Cheap calculation of multiple changesets/layers in a build history enable
   more granular layers.
-  Parallel builds based on a dataflow graph.
-  Selectively add, _remove_, mix-and-match arbitrary base layers below the
   current build task. Forget about amalgamation images to support your mixed
   toolchains, apply tools from multiple pre-built images one after another.
-  Define custom image manifests. Unlocked via flexible build tool layers,
   manifest files are built by the configuration via 'just another' step in a
   task. Select layers with your own code logic, cross-build multi-platform
   images to your hearts content, and more.

## How to use

1.  You will need a fresh BTRFs subvolume mounted and owned by your current user.
    Additionally, unprivileged_userns_clone should be enabled and the kernel
    compiled with support for userns.

    ```bash
    # mount -t btrfs -o rw,space_cache,user_subvol_rm_allowed,noacl,noatime,subvol=/stromatekt /dev/sdx /home/stromatekt
    btrfs filesystem df /home/stromatekt
    cat /proc/sys/kernel/unprivileged_userns_clone | grep 1
    cat /proc/config.gz  | gunzip -c | grep CONFIG_USER_NS=y
    ```

2.  Create `~/.config/stromatekt/config.json` with the path to the subvolume
    mount adjusted accordingly. It should look similar to:
    
    ```json
    {
    	"btrfs_root": "/home/stromatekt"
    }
    ```

3.  Prepare the example binary:

    ```bash
    pushd examples/prime && cargo build --release && popd
    ```

4.  Execute the example build:

    ```bash
    cargo run -- ./examples/parallel-dependency.json --no-dry-run
    ```

## Motivation

`docker build` is slow. The structure of a `Dockerfile` only permits a linear
sequence of instructions. Moreover, `docker compose` is even slower. It will
send, unpack, repack layers of images and local file system a _lot_. This can
take a significant amount of time. The author has observed builds, with
`Dockerfile` containing a single line of adding one link in the file system,
taking >4 minutes. This is unacceptable as development latency. Further,
caching of layers is inextricably bad due to the linear sequence logic. Let's
address both.

## Structure of an OCI file

The main data within an OCI container is an ordered collection of layers. Each
layer is essentially a _diff_ of the last, usually in the form of a `tar`
archive. (For slightly surprising reasons, a deletion is encoded as a file with
special naming rules).

When running a build, the builder will checkout the layers of the underlying
container, run its commands, and finally find the diff to encode into a new
layer. The two highly expensive filesystem tasks—checkout and diff—can be
implemented much more efficiently if we can utilize the checkpoint and
incremental diff logic of the filesystem itself.

Furthermore, this task is probably IO-bound. Meaning, we _should_ seek to
perform much of it in parallel wherever possible. Note that the layer sequence
of an OCI image is not commutative. However, as long as the task definition
itself opts-in by providing a canonical recombination order there shouldn't be
any reproducibility problem from creating layers via a _different_ order.

Example:
- `A --(proc0)-> B0` yielding diff `C0`
- `A --(proc1)-> B1` yielding diff `C1`
- => export layers as: `[A, C0, C1]`

Actually, we could even allow swapping `A` for a totally unrelated `A*` as long
as the build manifest makes this explicit. For instance, to provide a security
patch of an _underlying_ layer. Also, `proc0` and `proc1` can be executed with
_entirely different_ underlying technologies (i.e. one as a x86 process, one a
WASI executable).

## Planned extensions

1. Library files for build dependencies and maintainability. Define additional
   tasks in a separate file, then import specific changesets they define into
   another specification and let the dataflow resolver figure out a solution.
2. Reproducibility assertions via hashes, used for incremental builds.