sysroot-cleaner 1.0.1

A tool to clean up sysroots for Linux embedded devices in order to save storage space
[//]: # SPDX-FileCopyrightText: Matteo Settenvini <matteo.settenvini@montecristosoftware.eu>

[//]: # SPDX-License-Identifier: CC-BY-SA-4.0

# 🧹 Sysroot Cleaner

A tool to clean up sysroots for Linux embedded devices in order to save storage space.

Here by sysroot we mean the final _target system_ filesystem, rather than the _staging folder_ potentially containing intermediate cross-compilation byproducts.

## What does it do?

_Sysroot cleaner_ is a simple tool used to remove unnecessary files from a target folder which is holding the filesystem of an ELF-based OS (such as Linux). This can for instance be either a cross-compiled device target tree, or a folder being prepared for a local chroot jail. It recurses across all subfolders **part of the same filesystem** and looks for files that can be safely removed to reduce space usage.

The full list of found files is passed to a few modules (aka "cleaners") that can decide whether to keep or remove a specific file. These are:

* **dso**: maps all ELF files and their library dependencies to a directed acyclic graph. For each library, remove it transitively if unreachanble from any executable binary. **Note**: Libraries that are dynamically opened at runtime need to be manually allow-listed. If there is interest, we might support [.note.dlopen]https://github.com/systemd/systemd/blob/main/docs/ELF_DLOPEN_METADATA.md as it gains more widespread adoption.
* **allow-/block-list**: given a file of [gitignore patterns]https://git-scm.com/docs/gitignore#_pattern_format, either mark the file for keeping (if in the allowlist) or for removal (if in the blocklist).

## Commandline Options

Usage: `sysroot-cleaner [option…] <sysroot>`, where `<sysroot>` is mandatory, and the path to the root of the sysroot to clean up.

Options can be:

* `-n`, `--dry-run`: Simulate operations without carrying them out.
* `--split-to <dir>`: Instead of simply removing files, move them to the given location, preserving their relative folder structure.
* `--allowlist <file>`: An allowlist of files to keep, in `.gitignore` format. Can be passed multiple times. **Note**: this will take precedence over all other removal decisions.
* `--blocklist <file>`: A blocklist of files to remove, in `.gitignore` format. Can be passed multiple times.
* `--output-dotfile <file>`: An optional path to save the file graph of the DSO cleaner in GraphViz format. Useful for debugging.
* `--ld-path <dir>`: An additional path to consider when resolving libraries, relative to the sysroot root. Its behavior is similar of the one of the `LD_LIBRARY_PATH` environment variable when specified to the dynamic linker.

The log level can be controlled via the `LOG_LEVEL` environment variable, and can be one of: `error`, `warn`, `info`, `debug`, `trace`, or `off` (run completely silent).

## Example Usage

Assume that you have built a filesystem image, for instance through a tool like [buildroot](https://buildroot.org/downloads/manual/manual.html).

You could add a simple shell script to invoke `sysroot-cleaner`:

```bash
#!/bin/bash

# file: post_build.sh

set -e -o pipefail

readonly SCRIPT_DIR=$(realpath "$(dirname $0)")
readonly TARGET_DIR=$1

if [ ! -d "${TARGET_DIR}" ]; then
    echo "Expecting the rootfs folder as first argument"
    exit 1
fi

# Base lists
allow_lists=("${SCRIPT_DIR}/base.allowlist")
block_lists=("${SCRIPT_DIR}/base.blocklist")

LOG_LEVEL=info sysroot-cleaner \
    $(printf -- '--allowlist %s ' "${allow_lists[@]}") \
    $(printf -- '--blocklist %s ' "${block_lists[@]}") \
    "${TARGET_DIR}"
```

Then, you can set `BR2_ROOTFS_POST_BUILD_SCRIPT` to invoke `post_build.sh`.

## Changelog

### v1.0.1

* Bump dependencies.

### v1.0.0

* Initial stable release.