Deterministic source-based docker image checksum
You have a CI pipeline that builds a monorepo with many Dockerfiles.
You want to efficiently avoid rebuilding Dockerfiles that haven't changed, even when the rest of the monorepo did.
docker-source-checksum will calculate a hash of:
- all source files referenced by that
Dockerfile(figured out by parsing it)
- any additiona arguments that might affect the build
and then hashing all of these together, to give you deterministic checksum,
before you even attempt to call
docker build. You can use it as a
deterministic content-based ID to avoid rebuilding containers that
were already built (eg. by taging them with that checksum).
Using in your CI pipeline
Let's say, normally your CI pipeline would do something like.
Some problems with this method are:
- It takes some time for all the files of this build to be sent to docker deamon. This part alone can can take a substantial time, even in the happy case that nothing needs rebuilding since the container image is already cached locally.
- If exactly the same build was already done on some different machine, it will not be reused on this one, unless you have some smarter system set up to share them.
- You need to wait for the
docker buildto complete to get a unique id of the build.
With DSC you would:
BUILD_FULL_ID= BUILD_ID= # take just first 8 characters TAG_NAME=my-docker-repository.com/ :
and in less than a second, even for a big project, you get a deterministic cryptographic ID of the build without attemting to build anything just yet . At this point, you can potentially speculatively start parts of your CI with an already known docker image url.
Rest of your CI script can quickly check if this exact build already exists with:
if DOCKER_CLI_EXPERIMENTAL=enabled ; then fi
docker pull if you want it cached locally too).
And only if it was not ever built, only then you build locally and push it to your registry:
Warnings and missing features
- don't use it on untrusted
- the exact checksum is not stable yet and can change between versions
- variables expansion is not performed, so variables inside src paths in
COPYwill not work
["src1", "src", "dst"]syntax of
COPYis not supported (PRs welcome)
- file ownership is ignored
- it was put together in 2 hours, so if you plan to use it in production, maybe... review the code or something and tell me what you think
Having said that, seems to work great.
See docker-source-checksum releases,
cargo install docker-source-checksum.
Somewhat similiar to
$ docker-source-checksum --help docker-source-checksum 0.2.0 Dockerfile source checksum USAGE: docker-source-checksum [FLAGS] [OPTIONS] <context-path> FLAGS: -h, --help Prints help information --hex Output hash in hex -V, --version Prints version information OPTIONS: --extra-path <extra-path>... Path relative to context to include in the checksum --extra-string <extra-string>... String (like arguments to dockerfile) to include in the checksum -f, --file <file> Path to `Dockerfile` --ignore-path <ignore-path>... Path relative to context to ignore in the checksum ARGS: <context-path> Dockerfile build context path