Build postprocessors to reset metadata fields and hardlink files for build reproducibility
This crate provides two related programs for package build postprocessing.
add-det
add-det takes one or more paths as arguments,
and will recursively process normal files in those paths,
attempting to run a set of type-specific handlers on any files with extensions that match.
(Each argument can be either a single file or a directory to be processed recursively.)
For each processed file, a temporary file is opened, the contents are rewritten, the modification timestamp is copied from the original file to the temporary copy, and the copy is renamed over the original.
If processing fails, a warning is emitted, but no modifications are made and the program returns success.
The purpose of this tool is to eliminate common sources of non-determinism in builds, making it easier to create reproducible (package) builds.
Usage
Standalone usage
$ add-det /path/to/file /path/to/directory
Note that the program works in-place, replacing input files with the rewritten versions (if any modifications are made).
Some useful options:
- -v— enable debug output
- -j [N]— use- Nworkers (or as many as CPUs, if- Nis not given)
- --handler list|HANDLER|-HANDLER— constrain the list of handlers. Takes a comma-separated list of names, either a list of "positive" names, in which case only listed handlers will be used, or a list of "negative" names, each prefixed by minus, in which case the listed handlers will not be used. By default, handlers that cannot be initialized are skipped with a warning. If a "positive" list is given, failure to initialize a handler will cause an error. The special value- listcan be used to list known handlers.
- --brp— enable "build root program" mode, see below.
- -V,- --version— print program version
In an rpm build environment
When invoked with --brp, the $RPM_BUILD_ROOT environment variable must be defined and not empty.
All arguments must be below $RPM_BUILD_ROOT.
This option is intended to be used in rpm macros that define post-install steps.
See redhat-rpm-config pull request #293
for a pull request that added a call to add-det in %__os_install_post.
Verification instead of modification
When invoked with --check, the tool processes all files,
but does not actually save any modifications.
Instead, it'll fail if any files would have been modified.
It also returns an error if any files cannot be read.
Processors
ar
Accepts *.a.
Resets the embedded modification times to $SOURCE_DATE_EPOCH and owner:group to 0:0.
jar
Accepts *.jar.
This rewrites the zip file using the zip crate.
The modification times of archive entries are clamped to $SOURCE_DATE_EPOCH.
Extra metadata, i.e. primarily timestamps in UNIX format and DOS permissions,
is stripped (also because the crate does not support them).
javadoc
Accepts *.html.
This looks at the <head> portion of an HTML file and finds standard
lines inserted by Javadoc that specify the file creation date.
For example,
<!-- Generated by javadoc (<version>) on <date> --> is replaced by a version without the version and date,
and <meta name="dc.created" content="<date>"> is replaced by a version with $SOURCE_DATE_EPOCH.
gzip
Accepts *.gz.
This clamps the modification timestamp of the embedded content
to $SOURCE_DATE_EPOCH.
When files that were generated during the build are compressed with gzip,
the timestamp is embedded.
We clamp it to restore build reproducibility.
pyc
Accepts *.pyc.
This handler implements a .pyc file parser for Python bytecode files
and cleans up unused "flag references".
It is a Rust reimplementation of
the MarshalParser Python module.
pyc-zero-mtime
Accepts *.pyc.
This handler sets the internal timestamp in .pyc file header to 0,
and sets the mtime on the corresponding source .py file to 0.
This is intended to be used on OSTree
systems where mtimes are discarded,
causing a mismatch between the timestamp embedded in the .pyc file
and the filesystem metadata of the .py file.
This handler is not enabled by default and must be explicitly requested
via --handler pyc-zero-mtime.
zip
Accepts *.zip.
This rewrites the zip file using the zip crate.
The modification times of archive entries is clamped $SOURCE_DATE_EPOCH.
Extra metadata, i.e. primarily timestamps in UNIX format and DOS permissions,
is stripped (also because the crate does not support them).
Printing of .pyc files
When invoked with -p, this tool will print the contents of a .pyc file.
Special effort is made to show flags on objects and references to them.
Currently the actual code
(arrays of bytes in the Code object's code field)
are not printed.
To do this nicely we would need to disassemble the code.
Contributions welcome!
$ add-det -p /path/to/pyc-file
Code "<module>" 🚩204/(ref to 204)"<module>" 🚩0
  (ref to 22)"/usr/lib/python3.12/site-packages/elftools/construct/adapters.py":1
  argcount=0 posonlyargcount=0 kwonlyargcount=0 stacksize=5 flags=0
  -code: [560 bytes]
  -consts: (
    1 🚩2,
    ("Adapter" 🚩3, "AdaptationError" 🚩4, "Pass" 🚩5),
    ("int_to_bin" 🚩6, "bin_to_int" 🚩7, "swap_bytes" 🚩8),
    ("FlagsContainer" 🚩9, "HexString" 🚩10),
    ("BytesIO" 🚩11, "decodebytes" 🚩12),
    Code (ref to 14)"BitIntegerError"/(ref to 14)"BitIntegerError"
      "/usr/lib/python3.12/site-packages/elftools/construct/adapters.py" 🚩22:10
      argcount=0 posonlyargcount=0 kwonlyargcount=0 stacksize=1 flags=0
      -code: [16 bytes]
      -consts: ("BitIntegerError" 🚩14, None)
      -names: ("__name__" 🚩16, "__module__" 🚩17, "__qualname__" 🚩18, "__slots__" 🚩19) 🚩15
      -locals+names: () 🚩20
      -locals+kinds: [] 🚩21
      -linetable: [7 bytes]
      -exceptiontable: (ref to 21)[],
    (ref to 14)"BitIntegerError",
...
linkdupes
🚧🚧🚧 This program is currently experimental. 🚧🚧🚧
linkdupes takes one or more paths as arguments,
and will recursively process normal files in those paths,
looking for files that have the same contents.
Files with same contents, ownership, and mode will be hardlinked.
This program is similar to programs that hardlink files,
but takes $SOURCE_DATE_EPOCH into account:
file modification timestamps are clamped to $SOURCE_DATE_EPOCH.
Without this clamping,
hardlinking of files produced during a build is not stable,
because depending on the machine speed and file system timestamp granularity,
files might or might not be considered identical.
Usage
Standalone usage
$ linkdupes /path/to/file /path/to/directory
Note that the program works in-place, replacing input files with the rewritten versions (if any modifications are made).
Some useful options:
- -v… — enable debug output
- -n,- --dry-run— just print what would be done
- --brp— enable "build root program" mode, see below
- -V,- --version— print program version
- --ignore-mtimes,- --ignore-mode,- --ignore-owner— ignore modification timestamps, access mode, or owner and group when comparing files.
In an rpm build environment
When invoked with --brp, the $RPM_BUILD_ROOT environment variable must be defined and not empty.
All arguments must be below $RPM_BUILD_ROOT.
This option is intended to be used in rpm macros that define post-install steps.
Notes
This project is inspired by strip-nondeterminism, but is written from scratch in Rust. For Debian, build tools are written in Perl and more Perl is not an issue. But in Fedora/RHEL/…, tools are written in Bash, Python, or compiled, and we don't want to pull in Perl into all buildroots. In addition, the details differ in what kinds of processing we want to do. For example, Debian does not distribute Python bytecode files.