A re-imagined OCI image builder.
- Take advantage of native snapshot/diff/overlay functionality of filesystems.
Cheap calculation of multiple changesets/layers in a build history enable
more granular layers.
- Parallel builds based on a dataflow graph.
- Selectively add, _remove_, mix-and-match arbitrary base layers below the
current build task. Forget about amalgamation images to support your mixed
toolchains, apply tools from multiple pre-built images one after another.
- Define custom image manifests. Unlocked via flexible build tool layers,
manifest files are built by the configuration via 'just another' step in a
task. Select layers with your own code logic, cross-build multi-platform
images to your hearts content, and more.
## How to use
1. You will need a fresh BTRFs subvolume mounted and owned by your current user.
Additionally, unprivileged_userns_clone should be enabled and the kernel
compiled with support for userns.
```bash
btrfs filesystem df /home/stromatekt
cat /proc/sys/kernel/unprivileged_userns_clone | grep 1
cat /proc/config.gz | gunzip -c | grep CONFIG_USER_NS=y
```
2. Create `~/.config/stromatekt/config.json` with the path to the subvolume
mount adjusted accordingly. It should look similar to:
```json
{
"btrfs_root": "/home/stromatekt"
}
```
3. Prepare the example binary:
```bash
pushd examples/prime && cargo build --release && popd
```
4. Execute the example build:
```bash
cargo run -- ./examples/parallel-dependency.json --no-dry-run
```
## Motivation
`docker build` is slow. The structure of a `Dockerfile` only permits a linear
sequence of instructions. Moreover, `docker compose` is even slower. It will
send, unpack, repack layers of images and local file system a _lot_. This can
take a significant amount of time. The author has observed builds, with
`Dockerfile` containing a single line of adding one link in the file system,
taking >4 minutes. This is unacceptable as development latency. Further,
caching of layers is inextricably bad due to the linear sequence logic. Let's
address both.
## Structure of an OCI file
The main data within an OCI container is an ordered collection of layers. Each
layer is essentially a _diff_ of the last, usually in the form of a `tar`
archive. (For slightly surprising reasons, a deletion is encoded as a file with
special naming rules).
When running a build, the builder will checkout the layers of the underlying
container, run its commands, and finally find the diff to encode into a new
layer. The two highly expensive filesystem tasks—checkout and diff—can be
implemented much more efficiently if we can utilize the checkpoint and
incremental diff logic of the filesystem itself.
Furthermore, this task is probably IO-bound. Meaning, we _should_ seek to
perform much of it in parallel wherever possible. Note that the layer sequence
of an OCI image is not commutative. However, as long as the task definition
itself opts-in by providing a canonical recombination order there shouldn't be
any reproducibility problem from creating layers via a _different_ order.
Example:
- `A --(proc0)-> B0` yielding diff `C0`
- `A --(proc1)-> B1` yielding diff `C1`
- => export layers as: `[A, C0, C1]`
Actually, we could even allow swapping `A` for a totally unrelated `A*` as long
as the build manifest makes this explicit. For instance, to provide a security
patch of an _underlying_ layer. Also, `proc0` and `proc1` can be executed with
_entirely different_ underlying technologies (i.e. one as a x86 process, one a
WASI executable).
## Planned extensions
1. Library files for build dependencies and maintainability. Define additional
tasks in a separate file, then import specific changesets they define into
another specification and let the dataflow resolver figure out a solution.
2. Reproducibility assertions via hashes, used for incremental builds.