<!DOCTYPE html>
<html lang="en">
<head>
<meta charset="utf-8">
<meta name="viewport" content="width=device-width, initial-scale=1">
<title>monitord - know how happy your systemd is!</title>
<link rel="preconnect" href="https://fonts.googleapis.com">
<link rel="preconnect" href="https://fonts.gstatic.com" crossorigin>
<link href="https://fonts.googleapis.com/css2?family=Heebo:wght@400;600&display=swap" rel="stylesheet">
<style>
:root {
--sd-brand-black: hsl(270, 19%, 13%);
--sd-brand-green: hsl(145, 66%, 51%);
--sd-brand-white: #fff;
--sd-black: hsl(270, 7%, 13%);
--sd-green: hsl(145, 66%, 43%);
--sd-gray-extralight: hsl(30, 10%, 96%);
--sd-gray-light: hsl(30, 10%, 92%);
--sd-gray: hsl(30, 10%, 85%);
--sd-gray-dark: hsl(257, 23%, 20%);
--sd-gray-extradark: hsl(257, 23%, 16%);
--sd-font-weight-normal: 400;
--sd-font-weight-bold: 600;
--sd-foreground-color: var(--sd-gray-extradark);
--sd-background-color: var(--sd-gray-extralight);
--sd-link-color: var(--sd-green);
--sd-link-font-weight: var(--sd-font-weight-bold);
}
@media (prefers-color-scheme: dark) {
:root {
color-scheme: dark;
--sd-foreground-color: var(--sd-gray);
--sd-background-color: var(--sd-black);
--sd-link-color: var(--sd-brand-green);
--sd-link-font-weight: var(--sd-font-weight-normal);
}
}
* {
-moz-box-sizing: border-box;
-webkit-box-sizing: border-box;
box-sizing: border-box;
}
html, body {
margin: 0;
padding: 0;
font-size: 1rem;
font-family: "Heebo", sans-serif;
font-weight: 400;
line-height: 1.6;
}
body {
color: var(--sd-foreground-color);
background-color: var(--sd-background-color);
}
.container {
width: 80%;
max-width: 720px;
margin: 0 auto;
text-align: center;
}
.page-logo {
display: block;
padding: 5rem 0 3rem;
color: var(--sd-foreground-color);
}
.page-logo > svg {
display: block;
width: 16em;
height: auto;
margin: 0 auto;
}
h1 {
text-align: center;
font-size: 1.87rem;
font-weight: 400;
font-style: normal;
margin: 0 0 2rem;
line-height: 1.25;
}
@media screen and (min-width: 650px) {
h1 {
font-size: 2.375em;
}
}
hr {
margin: 3rem auto 4rem;
width: 40%;
opacity: 40%;
}
a {
font-weight: var(--sd-link-font-weight);
text-decoration: none;
color: var(--sd-link-color);
cursor: pointer;
}
a:hover {
text-decoration: underline;
}
ul {
list-style: disc;
display: inline-block;
text-align: left;
padding-left: 1.5em;
margin: 0;
}
li {
margin: 0.25em 0;
font-size: 1.1rem;
}
.readme-content {
text-align: left;
margin-top: 2rem;
}
.readme-content h2 {
font-size: 1.25rem;
margin-top: 2.5em;
}
.readme-content h3 {
font-size: 1.15rem;
}
.readme-content h4 {
font-size: 1.05rem;
}
.readme-content p {
margin: 0.75em 0;
}
.readme-content ul, .readme-content ol {
display: block;
padding-left: 2em;
margin: 0.75em 0;
}
.readme-content li {
font-size: 1rem;
}
.readme-content pre {
padding: 1em;
border-radius: 5px;
overflow-x: auto;
background-color: var(--sd-highlight-bg, rgba(255, 255, 255, 1));
}
.readme-content code {
font-family: "SFMono-Regular", Consolas, "Liberation Mono", Menlo, monospace;
font-size: 0.875rem;
}
.readme-content p code,
.readme-content li code {
padding: 2px 6px;
border-radius: 3px;
background-color: var(--sd-highlight-inline-bg, rgba(0, 0, 0, 0.07));
}
.readme-content pre code {
padding: 0;
background-color: transparent;
}
.readme-content strong {
font-weight: 600;
}
@media (prefers-color-scheme: light) {
.readme-content pre {
background-color: rgba(255, 255, 255, 1);
}
.readme-content p code,
.readme-content li code {
background-color: rgba(0, 0, 0, 0.07);
}
}
@media (prefers-color-scheme: dark) {
.readme-content pre {
background-color: rgba(0, 0, 0, 0.6);
}
.readme-content p code,
.readme-content li code {
background-color: rgba(255, 255, 255, 0.1);
}
}
footer {
text-align: center;
padding: 3em 0 3em;
font-size: 1em;
margin-top: 4rem;
}
</style>
</head>
<body>
<div class="container">
<div class="page-logo">
<svg role="img" aria-label="monitord logo" width="800" height="200" viewBox="0 0 800 200" xmlns="http://www.w3.org/2000/svg">
<path fill="currentColor" d="M 34 40 L 50 40 C 51.104568 40 52 40.895432 52 42 L 52 158 C 52 159.104568 51.104568 160 50 160 L 34 160 C 32.895432 160 32 159.104568 32 158 L 32 42 C 32 40.895432 32.895432 40 34 40 Z"/>
<path fill="currentColor" d="M 34 40 L 65 40 C 66.104568 40 67 40.895432 67 42 L 67 58 C 67 59.104568 66.104568 60 65 60 L 34 60 C 32.895432 60 32 59.104568 32 58 L 32 42 C 32 40.895432 32.895432 40 34 40 Z"/>
<path fill="currentColor" d="M 34 140 L 65 140 C 66.104568 140 67 140.895432 67 142 L 67 158 C 67 159.104568 66.104568 160 65 160 L 34 160 C 32.895432 160 32 159.104568 32 158 L 32 142 C 32 140.895432 32.895432 140 34 140 Z"/>
<text x="87" y="130" font-family="Helvetica" font-size="68" fill="#30d158">1</text>
<path fill="#ff453a" stroke="#ff453a" stroke-width="6" d="M 173 85 C 173 94.941124 164.941132 103 155 103 C 145.058868 103 137 94.941124 137 85 C 137 75.058876 145.058868 67 155 67 C 164.941132 67 173 75.058876 173 85 Z"/>
<path fill="#000000" stroke="#ff453a" stroke-width="8" stroke-linecap="round" d="M 168 98 L 185 115"/>
<path fill="currentColor" d="M 220 40 L 236 40 C 237.104568 40 238 40.895432 238 42 L 238 158 C 238 159.104568 237.104568 160 236 160 L 220 160 C 218.895432 160 218 159.104568 218 158 L 218 42 C 218 40.895432 218.895432 40 220 40 Z"/>
<path fill="currentColor" d="M 205 40 L 236 40 C 237.104568 40 238 40.895432 238 42 L 238 58 C 238 59.104568 237.104568 60 236 60 L 205 60 C 203.895432 60 203 59.104568 203 58 L 203 42 C 203 40.895432 203.895432 40 205 40 Z"/>
<path fill="currentColor" d="M 205 140 L 236 140 C 237.104568 140 238 140.895432 238 142 L 238 158 C 238 159.104568 237.104568 160 236 160 L 205 160 C 203.895432 160 203 159.104568 203 158 L 203 142 C 203 140.895432 203.895432 140 205 140 Z"/>
<text x="279" y="132" font-family="Helvetica" font-size="72" fill="currentColor">monitord</text>
</svg>
</div>
<h1>monitord ... know how happy your systemd is! <span role="img" aria-label="smiling face">😊</span></h1>
<hr>
<ul>
<li><a href="https://github.com/cooperlees/monitord">GitHub</a></li>
<li><a href="https://crates.io/crates/monitord">monitord on crates.io</a></li>
<li><a href="https://crates.io/crates/monitord-exporter">monitord-exporter on crates.io</a></li>
<li><a href="monitord/index.html">Rust Docs</a></li>
</ul>
<hr>
<div class="readme-content">
<p>monitord ... know how happy your systemd is! 😊</p>
<h2>Requirements</h2>
<ul>
<li><strong>Linux</strong> with <strong>systemd</strong> (monitord uses D-Bus and procfs APIs that are Linux-specific)</li>
<li>systemd-networkd installed (for networkd metrics; the collector can be disabled in config)</li>
<li>PID 1 stats require procfs (<code>/proc</code>) — available on all standard Linux systems</li>
<li>D-Bus system bus accessible (default: <code>unix:path=/run/dbus/system_bus_socket</code>)</li>
<li>Varlink metrics require systemd v260+ (optional; falls back to D-Bus automatically)</li>
</ul>
<h2>What does monitord monitor?</h2>
<p>monitord collects systemd health metrics via D-Bus (and optionally Varlink) and outputs them as JSON. It provides visibility into:</p>
<ul>
<li><strong>Unit counts</strong> — totals by type (service, mount, socket, timer, etc.) and state (active, failed, inactive, loaded, masked)</li>
<li><strong>Per-service stats</strong> — CPU usage, memory, I/O, restart count, task count, watchdog status, and state timestamps for specific services</li>
<li><strong>Unit state tracking</strong> — active state, load state, and health for individual units (with allowlist/blocklist filtering)</li>
<li><strong>systemd-networkd</strong> — per-interface operational, carrier, admin, and address states</li>
<li><strong>PID 1 health</strong> — CPU time, memory usage, file descriptor count, and task count for systemd (PID 1) via procfs</li>
<li><strong>Timers</strong> — trigger times, accuracy, delays, and associated service state for systemd timers</li>
<li><strong>Boot blame</strong> — the N slowest units at boot, similar to <code>systemd-analyze blame</code></li>
<li><strong>D-Bus daemon stats</strong> — connection counts, match rules, and per-peer/per-cgroup/per-user breakdowns (dbus-broker and dbus-daemon)</li>
<li><strong>Containers / machines</strong> — recursively collects the same metrics from systemd-nspawn containers and VMs via <code>systemd-machined</code></li>
<li><strong>Unit verification</strong> — runs <code>systemd-analyze verify</code> and reports failing unit counts by type</li>
</ul>
<h2>Run Modes</h2>
<p>We offer the following run modes:</p>
<ul>
<li>systemd-timer (legacy cron would work too)</li>
<li>Refer to <a href="monitord.timer">monitord.timer</a> and <a href="monitord.service">monitord.service</a> unit files</li>
<li>Ensure no <code>daemon:</code> mode options are set in <code>monitord.conf</code></li>
<li>daemon mode</li>
<li>Enable daemon mode in configuration file</li>
<li>Stats will be written to stdout every <code>daemon_stats_refresh_secs</code></li>
</ul>
<p>Open to more formats / run methods ... Open an issue to discuss. Depends on the dependencies basically.</p>
<p><code>monitord</code> is a config driven binary. We plan to keep CLI arguments to a minimum.</p>
<p><strong>INFO</strong> level logging is enabled to stderr by default. Use <code>-l LEVEL</code> to increase or decrease logging.</p>
<h2>Quick Start</h2>
<ol>
<li>
<p>Install monitord:
<code>bash
cargo install monitord</code></p>
</li>
<li>
<p>Create a minimal config at <code>/etc/monitord.conf</code>:
```ini
[monitord]
output_format = json-pretty</p>
</li>
</ol>
<p>[units]
enabled = true</p>
<p>[pid1]
enabled = true
```</p>
<ol>
<li>Run it:
<code>bash
monitord</code></li>
</ol>
<p>This will collect unit counts and PID 1 stats, then print JSON to stdout and exit. Enable additional collectors in the config as needed (see <a href="#config">Configuration</a> below).</p>
<h2>Install</h2>
<h3>Pre-built binaries</h3>
<p>Download pre-built binaries from <a href="https://github.com/cooperlees/monitord/releases">GitHub Releases</a>:</p>
<ul>
<li><code>monitord-linux-amd64</code> — x86_64</li>
<li><code>monitord-linux-aarch64</code> — ARM64</li>
</ul>
<pre><code class="language-bash"># Example: download and install the latest release (x86_64)
curl -L -o /usr/local/bin/monitord \
https://github.com/cooperlees/monitord/releases/latest/download/monitord-linux-amd64
chmod +x /usr/local/bin/monitord
</code></pre>
<h3>From crates.io</h3>
<p>Install via cargo or use as a dependency in your <code>Cargo.toml</code>.</p>
<ul>
<li><code>cargo install monitord</code></li>
<li>Create (copy from repo) a <code>monitord.conf</code></li>
<li>Defaults to looking for it at /etc/monitord.conf</li>
<li><code>monitord --help</code></li>
<li>Also support <code>MONITORD_CONFIG</code> env var to set config path</li>
</ul>
<pre><code class="language-console">crl-linux:monitord cooper$ monitord --help
monitord: Know how happy your systemd is! 😊
Usage: monitord [OPTIONS]
Options:
-c, --config <CONFIG>
Location of your monitord config
[default: /etc/monitord.conf]
-l, --log-level <LOG_LEVEL>
Adjust the console log-level
[default: Info]
[possible values: error, warn, info, debug, trace]
-h, --help
Print help (see a summary with '-h')
-V, --version
Print version
</code></pre>
<h3>Config</h3>
<p>monitord can have the different components monitored. To enable / disabled set the
following in our monitord.conf. This file is <a href="https://en.wikipedia.org/wiki/INI_file">ini format</a>
to match systemd unit files.</p>
<pre><code class="language-ini"># Pure ini - no yes/no for bools
[monitord]
# Set a custom dbus address to connect to
# OPTIONAL: If not set, we default to the Unix socket below
dbus_address = unix:path=/run/dbus/system_bus_socket
# Timeout in seconds for dbus connection/collections
# OPTIONAL: default is 30 seconds
dbus_timeout = 30
# Run as a daemon or 1 time
daemon = false
# Time to refresh systemd stats in seconds
# Daemon mode only
daemon_stats_refresh_secs = 60
# Prefix flat-json key with this value
# The value automatically gets a '.' appended (so don't put here)
key_prefix = monitord
# cron/systemd timer output format
# Supported: json, json-flat, json-pretty
output_format = json
# Grab as much stats from DBus GetStats call
# we can from running dbus daemon
# More tested on dbus-broker daemon
[dbus]
# Summary counters - both dbus-broker + dbus-daemon
enabled = false
# dbus.user.* metrics: user stats as reported by dbus-broker
user_stats = false
# dbus.oeer.* metrics: peer stats as reported by dbus-broker
peer_stats = false
# dbus.cgroup.* stats is an aggregation of peer_stats by cgroup
# by dbus-broker
cgroup_stats = false
# Grab networkd stats from files + networkctl
[networkd]
enabled = true
link_state_dir = /run/systemd/netif/links
# Enable grabbing PID 1 stats via procfs
[pid1]
enabled = true
# Services to grab extra stats for
# .service is important as that's what DBus returns from `list_units`
[services]
foo.service
[timers]
enabled = true
[timers.allowlist]
foo.timer
[timers.blocklist]
bar.timer
# Grab unit status counts via dbus
[units]
enabled = true
state_stats = true
# Filter what services you want collect state stats for
# If both lists are configured blocklist is preferred
# If neither exist all units state will generate counters
[units.state_stats.allowlist]
foo.service
[units.state_stats.blocklist]
bar.service
# machines config
[machines]
enabled = true
# Same rules apply as state_stats lists above
[machines.allowlist]
foo
[machines.blocklist]
bar
# Boot blame metrics - shows the N slowest units at boot
# Similar to `systemd-analyze blame`
# Disabled by default
[boot]
enabled = false
# Cache boot blame stats in /run/monitord/<boot_id>.boot_blame.bin
# Enabled by default; set false to force recalculation every run
cache_enabled = true
# Number of slowest units to report
num_slowest_units = 5
# Optional: only include specific units in boot blame (if empty, all units are checked)
# Same rules apply as state_stats lists above
[boot.allowlist]
# slow-startup.service
# Optional: exclude specific units from boot blame
[boot.blocklist]
# noisy-but-expected.service
# Unit verification using systemd-analyze verify
# Disabled by default as it can be slow on large systems
[verify]
enabled = false
# Optional: only verify specific units (if empty, all units are checked)
[verify.allowlist]
# example.service
# example.timer
# Optional: skip verification for specific units
[verify.blocklist]
# noisy.service
# broken.timer
</code></pre>
<p>When using the provided <code>monitord.service</code>, systemd creates <code>/run/monitord</code> via
<code>RuntimeDirectory=monitord</code> and assigns ownership to the configured service <code>User</code>/<code>Group</code>.
If you run monitord another way, ensure <code>/run/monitord</code> exists and is writable by the
monitord process user so boot cache files can be created.</p>
<h2>Machines support</h2>
<p>From version <code>>=0.11</code> monitord supports obtaining the same set of key from
systemd 'machines' (i.e. <code>machinectl --list</code>).</p>
<p>The keys are the same format as below in <code>json_flat</code> output but are prefixed with
the <code>machine</code> keyword and machine name. For example:</p>
<pre><code class="language-json"># $KEY_PREFIX.machine.$MACHINE_NAME
{
...
"monitord.machine.foo.pid1.fd_count": 69,
...
}
</code></pre>
<h2>Output Formats</h2>
<h3>json</h3>
<p>Normal <code>serde_json</code> non pretty JSON. All on one line. Most compact format.</p>
<h3>json-flat</h3>
<p>Move all key value pairs to the top level and . notate components + sub values.
Is semi pretty too + custom. All unittested ...</p>
<p><code>stat_collection_run_time_ms</code> is emitted in <strong>milliseconds</strong> (with <code>_ms</code> suffix) to follow
Prometheus metric naming conventions for duration units, which keeps unit semantics
clear and consistent when these keys are transformed into Prometheus metric names.</p>
<pre><code class="language-json">{
"boot.blame.dnf5-automatic.service": 204.159,
"boot.blame.cpe_chef.service": 103.05,
"boot.blame.sys-module-fuse.device": 16.21,
"boot.blame.dev-ttyS0.device": 15.809,
"boot.blame.systemd-networkd-wait-online.service": 1.674,
"collection_timings.list_units_ms": 5.26,
"collection_timings.per_unit_loop_ms": 42.99,
"collection_timings.service_dbus_fetches": 0,
"collection_timings.state_dbus_fetches": 0,
"collection_timings.timer_dbus_fetches": 24,
"collector_timings.boot_blame.elapsed_ms": 53.36,
"collector_timings.boot_blame.start_offset_ms": 0.08,
"collector_timings.boot_blame.success": 1,
"collector_timings.units.elapsed_ms": 53.24,
"collector_timings.units.start_offset_ms": 0.06,
"collector_timings.units.success": 1,
"dbus.active_connections": 10,
"dbus.bus_names": 16,
"dbus.incomplete_connections": 0,
"dbus.match_rules": 26,
"dbus.peak_bus_names": 33,
"dbus.peak_bus_names_per_connection": 2,
"dbus.peak_match_rules": 33,
"dbus.peak_match_rules_per_connection": 13,
"dbus.cgroup.system.slice-systemd-logind.service.activation_request_bytes": 0,
"dbus.cgroup.system.slice-systemd-logind.service.activation_request_fds": 0,
"dbus.cgroup.system.slice-systemd-logind.service.incoming_bytes": 16,
"dbus.cgroup.system.slice-systemd-logind.service.incoming_fds": 0,
"dbus.cgroup.system.slice-systemd-logind.service.match_bytes": 6942,
"dbus.cgroup.system.slice-systemd-logind.service.matches": 5,
"dbus.cgroup.system.slice-systemd-logind.service.name_objects": 1,
"dbus.cgroup.system.slice-systemd-logind.service.outgoing_bytes": 0,
"dbus.cgroup.system.slice-systemd-logind.service.outgoing_fds": 0,
"dbus.cgroup.system.slice-systemd-logind.service.reply_objects": 0,
"dbus.peer.org.freedesktop.systemd1.activation_request_bytes": 0,
"dbus.peer.org.freedesktop.systemd1.activation_request_fds": 0,
"dbus.peer.org.freedesktop.systemd1.incoming_bytes": 16,
"dbus.peer.org.freedesktop.systemd1.incoming_fds": 0,
"dbus.peer.org.freedesktop.systemd1.match_bytes": 46533,
"dbus.peer.org.freedesktop.systemd1.matches": 33,
"dbus.peer.org.freedesktop.systemd1.name_objects": 1,
"dbus.peer.org.freedesktop.systemd1.outgoing_bytes": 0,
"dbus.peer.org.freedesktop.systemd1.outgoing_fds": 0,
"dbus.peer.org.freedesktop.systemd1.reply_objects": 0,
"dbus.user.cooper.bytes": 919236,
"dbus.user.cooper.fds": 78,
"dbus.user.cooper.matches": 510,
"dbus.user.cooper.objects": 80,
"networkd.eno4.address_state": 3,
"networkd.eno4.admin_state": 4,
"networkd.eno4.carrier_state": 5,
"networkd.eno4.ipv4_address_state": 3,
"networkd.eno4.ipv6_address_state": 2,
"networkd.eno4.oper_state": 9,
"networkd.eno4.required_for_online": 1,
"networkd.managed_interfaces": 2,
"networkd.wg0.address_state": 3,
"networkd.wg0.admin_state": 4,
"networkd.wg0.carrier_state": 5,
"networkd.wg0.ipv4_address_state": 3,
"networkd.wg0.ipv6_address_state": 3,
"networkd.wg0.oper_state": 9,
"networkd.wg0.required_for_online": 1,
"pid1.cpu_time_kernel": 48,
"pid1.cpu_user_kernel": 41,
"pid1.fd_count": 245,
"pid1.memory_usage_bytes": 19165184,
"pid1.tasks": 1,
"services.chronyd.service.active_enter_timestamp": 1683556542382710,
"services.chronyd.service.active_exit_timestamp": 0,
"services.chronyd.service.cpuusage_nsec": 328951000,
"services.chronyd.service.inactive_exit_timestamp": 1683556541360626,
"services.chronyd.service.ioread_bytes": 18446744073709551615,
"services.chronyd.service.ioread_operations": 18446744073709551615,
"services.chronyd.service.memory_available": 18446744073709551615,
"services.chronyd.service.memory_current": 5214208,
"services.chronyd.service.nrestarts": 0,
"services.chronyd.service.restart_usec": 100000,
"services.chronyd.service.state_change_timestamp": 1683556542382710,
"services.chronyd.service.status_errno": 0,
"services.chronyd.service.tasks_current": 1,
"services.chronyd.service.timeout_clean_usec": 18446744073709551615,
"services.chronyd.service.watchdog_usec": 0,
"stat_collection_run_time_ms": 87.4013,
"system-state": 3,
"timers.fstrim.timer.accuracy_usec": 3600000000,
"timers.fstrim.timer.fixed_random_delay": 0,
"timers.fstrim.timer.last_trigger_usec": 1743397269608978,
"timers.fstrim.timer.last_trigger_usec_monotonic": 0,
"timers.fstrim.timer.next_elapse_usec_monotonic": 0,
"timers.fstrim.timer.next_elapse_usec_realtime": 1744007133996149,
"timers.fstrim.timer.persistent": 1,
"timers.fstrim.timer.randomized_delay_usec": 6000000000,
"timers.fstrim.timer.remain_after_elapse": 1,
"timers.fstrim.timer.service_unit_last_state_change_usec": 1743517244700135,
"timers.fstrim.timer.service_unit_last_state_change_usec_monotonic": 639312703,
"unit_states.chronyd.service.active_state": 1,
"unit_states.chronyd.service.loaded_state": 1,
"unit_states.chronyd.service.unhealthy": 0,
"units.activating_units": 0,
"units.active_units": 403,
"units.automount_units": 1,
"units.device_units": 150,
"units.failed_units": 0,
"units.inactive_units": 159,
"units.jobs_queued": 0,
"units.loaded_units": 497,
"units.masked_units": 25,
"units.mount_units": 52,
"units.not_found_units": 38,
"units.path_units": 4,
"units.scope_units": 17,
"units.service_units": 199,
"units.slice_units": 7,
"units.socket_units": 28,
"units.target_units": 54,
"units.timer_units": 20,
"units.total_units": 562,
"verify.failing.device": 43,
"verify.failing.mount": 15,
"verify.failing.service": 31,
"verify.failing.slice": 1,
"verify.failing.total": 97,
"version": "255.7-1.fc40"
}
</code></pre>
<h3>json-pretty</h3>
<p>Normal <code>serde_json</code> pretty representations of each components structs.</p>
<h3>Per-collector timing metrics</h3>
<p><code>monitord</code> records the wall time each collector future spends inside a single
<code>stat_collector</code> cycle and exposes the result on <code>MonitordStats::collector_timings</code>,
plus an inner phase breakdown for the units collector
(<code>SystemdUnitStats::collection_timings</code>).</p>
<table>
<thead>
<tr>
<th>Field</th>
<th>Meaning</th>
</tr>
</thead>
<tbody>
<tr>
<td><code>collector_timings.<name>.start_offset_ms</code></td>
<td>ms from the top of the cycle until the spawned future was first polled. Should be sub-ms when collectors are running in parallel; a non-trivial value means the spawn loop or runtime is delaying first poll.</td>
</tr>
<tr>
<td><code>collector_timings.<name>.elapsed_ms</code></td>
<td>ms from first poll to completion for that collector.</td>
</tr>
<tr>
<td><code>collector_timings.<name>.success</code></td>
<td>1 if the collector returned Ok, 0 otherwise.</td>
</tr>
<tr>
<td><code>collection_timings.list_units_ms</code></td>
<td>ms for the systemd <code>ListUnits</code> D-Bus call (one batched call).</td>
</tr>
<tr>
<td><code>collection_timings.per_unit_loop_ms</code></td>
<td>ms spent walking each listed unit, including any per-unit D-Bus calls (timer/state/service).</td>
</tr>
<tr>
<td><code>collection_timings.timer_dbus_fetches</code></td>
<td>Count of timer D-Bus property fetches this run.</td>
</tr>
<tr>
<td><code>collection_timings.state_dbus_fetches</code></td>
<td>Count of unit-state D-Bus fetches (only when <code>state_stats_time_in_state</code> is enabled).</td>
</tr>
<tr>
<td><code>collection_timings.service_dbus_fetches</code></td>
<td>Count of per-service D-Bus property fetches.</td>
</tr>
</tbody>
</table>
<p>Comparing <code>sum(collector_timings.*.elapsed_ms)</code> against
<code>stat_collection_run_time_ms</code> gives an effective parallelism ratio
(<code>sum / wall ≈ N</code> means N-way parallelism, ≈ 1 means effectively serial).</p>
<p>The per-collector lines are also emitted to logs at <code>debug!</code> level. The end-of-cycle
"stat collection run took {}ms" summary stays at <code>info!</code>.</p>
<h4>Varlink-vs-D-Bus parity</h4>
<p><code>collection_timings</code> is populated identically by the D-Bus path
(<code>units::parse_unit_state</code>) and the varlink path
(<code>varlink_units::parse_metrics</code>). In the varlink case, <code>list_units_ms</code> is the
bulk varlink <code>List</code> call on <code>io.systemd.Manager</code> and <code>per_unit_loop_ms</code> is the
local parse loop; the <code>*_dbus_fetches</code> counters stay at zero, which is itself a
useful signal that the varlink path is not paying per-unit D-Bus cost. This
makes <code>varlink.enabled = true</code> vs <code>false</code> directly comparable on the same host.</p>
<p><strong>Convention for new collectors moved to varlink:</strong> when porting a collector
from D-Bus to varlink, add the equivalent inner timings so the two
implementations remain comparable. The minimum is wall time of the bulk
fetch (analogous to <code>list_units_ms</code>) and the local parse loop (analogous to
<code>per_unit_loop_ms</code>), recorded onto a struct nested inside the collector's
public stats type. Single-shot varlink calls (e.g. networkd <code>Describe</code>) do
not need an inner split — the outer <code>collector_timings.<name>.elapsed_ms</code>
already covers them.</p>
<h3>Metric Value Reference</h3>
<p>Many metrics are serialized as integers. Here are the enum mappings:</p>
<p><strong>system-state</strong></p>
<table>
<thead>
<tr>
<th>Value</th>
<th>State</th>
</tr>
</thead>
<tbody>
<tr>
<td>0</td>
<td>unknown</td>
</tr>
<tr>
<td>1</td>
<td>initializing</td>
</tr>
<tr>
<td>2</td>
<td>starting</td>
</tr>
<tr>
<td>3</td>
<td>running</td>
</tr>
<tr>
<td>4</td>
<td>degraded</td>
</tr>
<tr>
<td>5</td>
<td>maintenance</td>
</tr>
<tr>
<td>6</td>
<td>stopping</td>
</tr>
<tr>
<td>7</td>
<td>offline</td>
</tr>
</tbody>
</table>
<p><strong>active_state</strong> (unit_states.*.active_state)</p>
<table>
<thead>
<tr>
<th>Value</th>
<th>State</th>
</tr>
</thead>
<tbody>
<tr>
<td>0</td>
<td>unknown</td>
</tr>
<tr>
<td>1</td>
<td>active</td>
</tr>
<tr>
<td>2</td>
<td>reloading</td>
</tr>
<tr>
<td>3</td>
<td>inactive</td>
</tr>
<tr>
<td>4</td>
<td>failed</td>
</tr>
<tr>
<td>5</td>
<td>activating</td>
</tr>
<tr>
<td>6</td>
<td>deactivating</td>
</tr>
</tbody>
</table>
<p><strong>loaded_state</strong> (unit_states.*.loaded_state)</p>
<table>
<thead>
<tr>
<th>Value</th>
<th>State</th>
</tr>
</thead>
<tbody>
<tr>
<td>0</td>
<td>unknown</td>
</tr>
<tr>
<td>1</td>
<td>loaded</td>
</tr>
<tr>
<td>2</td>
<td>error</td>
</tr>
<tr>
<td>3</td>
<td>masked</td>
</tr>
<tr>
<td>4</td>
<td>not-found</td>
</tr>
</tbody>
</table>
<p><strong>networkd address_state / ipv4_address_state / ipv6_address_state</strong></p>
<table>
<thead>
<tr>
<th>Value</th>
<th>State</th>
</tr>
</thead>
<tbody>
<tr>
<td>0</td>
<td>unknown</td>
</tr>
<tr>
<td>1</td>
<td>off</td>
</tr>
<tr>
<td>2</td>
<td>degraded</td>
</tr>
<tr>
<td>3</td>
<td>routable</td>
</tr>
</tbody>
</table>
<p><strong>networkd admin_state</strong></p>
<table>
<thead>
<tr>
<th>Value</th>
<th>State</th>
</tr>
</thead>
<tbody>
<tr>
<td>0</td>
<td>unknown</td>
</tr>
<tr>
<td>1</td>
<td>pending</td>
</tr>
<tr>
<td>2</td>
<td>failed</td>
</tr>
<tr>
<td>3</td>
<td>configuring</td>
</tr>
<tr>
<td>4</td>
<td>configured</td>
</tr>
<tr>
<td>5</td>
<td>unmanaged</td>
</tr>
<tr>
<td>6</td>
<td>linger</td>
</tr>
</tbody>
</table>
<p><strong>networkd carrier_state</strong></p>
<table>
<thead>
<tr>
<th>Value</th>
<th>State</th>
</tr>
</thead>
<tbody>
<tr>
<td>0</td>
<td>unknown</td>
</tr>
<tr>
<td>1</td>
<td>off</td>
</tr>
<tr>
<td>2</td>
<td>no-carrier</td>
</tr>
<tr>
<td>3</td>
<td>dormant</td>
</tr>
<tr>
<td>4</td>
<td>degraded-carrier</td>
</tr>
<tr>
<td>5</td>
<td>carrier</td>
</tr>
<tr>
<td>6</td>
<td>enslaved</td>
</tr>
</tbody>
</table>
<p><strong>networkd oper_state</strong></p>
<table>
<thead>
<tr>
<th>Value</th>
<th>State</th>
</tr>
</thead>
<tbody>
<tr>
<td>0</td>
<td>unknown</td>
</tr>
<tr>
<td>1</td>
<td>missing</td>
</tr>
<tr>
<td>2</td>
<td>off</td>
</tr>
<tr>
<td>3</td>
<td>no-carrier</td>
</tr>
<tr>
<td>4</td>
<td>dormant</td>
</tr>
<tr>
<td>5</td>
<td>degraded-carrier</td>
</tr>
<tr>
<td>6</td>
<td>carrier</td>
</tr>
<tr>
<td>7</td>
<td>degraded</td>
</tr>
<tr>
<td>8</td>
<td>enslaved</td>
</tr>
<tr>
<td>9</td>
<td>routable</td>
</tr>
</tbody>
</table>
<h2>dbus stats</h2>
<p>You're going to need to be root or allow permissiong to pull dbus stats.
For dbus-broker here is example config allow a user <code>monitord</code> to query
<code>getStats</code></p>
<pre><code class="language-xml">[cooper@l33t ~]# cat /etc/dbus-1/system.d/allow_monitord_stats.conf
<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE busconfig PUBLIC
"-//freedesktop//DTD D-BUS Bus Configuration 1.0//EN"
"http://www.freedesktop.org/standards/dbus/1.0/busconfig.dtd">
<busconfig>
<policy user="monitord">
<allow send_destination="org.freedesktop.DBus"
send_interface="org.freedesktop.DBus.Debug.Stats"
send_member="GetStats"
send_path="/org/freedesktop/DBus"
send_type="method_call"/>
</policy>
</busconfig>
</code></pre>
<h2>Development</h2>
<p>To do test runs (requires <code>systemd</code> and <code>systemd-networkd</code> <em>installed</em>)</p>
<ul>
<li>
<p>Pending what you have enabled in your config ...</p>
</li>
<li>
<p><code>cargo run -- -c monitord.conf -l debug</code></p>
</li>
</ul>
<p>Ensure the following pass before submitting a PR (CI checks):</p>
<ul>
<li><code>cargo test</code></li>
<li><code>cargo clippy</code></li>
<li><code>cargo fmt</code></li>
</ul>
<h3>Releasing a new version</h3>
<ol>
<li>Increment the version in <code>Cargo.toml</code></li>
<li>Run <code>./build_docs.sh</code> to regenerate docs</li>
<li>Commit with message: <code>Move to version X.Y.Z for release + update docs</code></li>
<li>If you have commit bit, push directly to main. Otherwise, push a branch and open a PR.</li>
<li>Cut a GitHub release: <code>gh release create X.Y.Z --title "X.Y.Z" --generate-notes</code></li>
</ol>
<h3>Generate codegen APIs</h3>
<ul>
<li><code>cargo install zbus_xmlgen</code></li>
<li><code>zbus-xmlgen system org.freedesktop.systemd1 /org/freedesktop/systemd1/unit/chronyd_2eservice</code></li>
</ul>
<p>Then add the following macros to tell clippy to go away:</p>
<pre><code class="language-rust">#![allow(warnings)]
#![allow(clippy)]
</code></pre>
<h3>Non Linux development</h3>
<p>Sometimes I develop from my Mac OS X laptop. So I thought I'd document and
add the way I build a Fedora Rawhide container and mount the local repo to /repo
in the container to run monitord and test.</p>
<ul>
<li>Build the image (w/git, rust tools and systemd)</li>
<li><code>docker build -t monitord-dev .</code></li>
<li>Start via systemd and mount the monitord repo to /repo</li>
<li><code>docker run --rm --name monitord-dev -it --privileged --tmpfs /run --tmpfs /tmp -v $(pwd):/repo monitord-dev /sbin/init</code><ul>
<li><code>--rm</code> is optional but will remove the container when stopped</li>
</ul>
</li>
</ul>
<p>You can now log into the container to build + run tests and run the binary now against systemd.</p>
<ul>
<li><code>docker exec -it monitord-dev bash</code></li>
<li><code>cd /repo ; cargo run -- -c monitord</code></li>
<li>networkd etc. are not running my default but can be started ...</li>
<li><code>systemctl start systemd-networkd</code><ul>
<li>No interfaces will be managed tho by default in the container ...</li>
</ul>
</li>
</ul>
<h2>Troubleshooting</h2>
<p><strong>"Connection refused" or D-Bus connection errors</strong></p>
<p>Ensure the system D-Bus daemon is running and the socket exists at <code>/run/dbus/system_bus_socket</code>. If using a custom address, set <code>dbus_address</code> in <code>[monitord]</code> config. Increase <code>dbus_timeout</code> if running on slow systems.</p>
<p><strong>Empty or missing networkd metrics</strong></p>
<p>systemd-networkd must be installed and running (<code>systemctl start systemd-networkd</code>). If networkd is not in use on your system, disable the collector with <code>enabled = false</code> in <code>[networkd]</code>.</p>
<p><strong>Permission denied for D-Bus stats</strong></p>
<p>The <code>[dbus]</code> collector requires permission to call <code>org.freedesktop.DBus.Debug.Stats.GetStats</code>. Either run monitord as root or add a D-Bus policy file — see the <a href="#dbus-stats">dbus stats</a> section.</p>
<p><strong>PID 1 stats unavailable</strong></p>
<p>PID 1 stats require Linux with procfs mounted at <code>/proc</code>. This collector is compiled out on non-Linux targets. If <code>/proc</code> is not available (some container runtimes), disable with <code>enabled = false</code> in <code>[pid1]</code>.</p>
<p><strong>Collector errors don't crash monitord</strong></p>
<p>When an individual collector fails (e.g., networkd not running, D-Bus timeout), monitord logs a warning and continues with the remaining collectors. Check stderr output or increase the log level (<code>-l debug</code>) to see which collectors had issues.</p>
<p><strong>Large u64 values (18446744073709551615) in output</strong></p>
<p>These represent <code>u64::MAX</code> and mean "not available" or "not tracked" for that metric. This is how systemd reports fields that are unsupported or not configured for the unit (e.g., <code>memory_available</code> when <code>MemoryMax=</code> is not set).</p>
<h2>Library API</h2>
<p>monitord can be used as a Rust library. See the full API documentation at <a href="https://monitord.xyz/monitord/index.html">monitord.xyz</a>.</p>
<h2>DBus</h2>
<p>All monitord's dbus is done via async (tokio) <a href="https://crates.io/crates/zbus">zbus</a> crate.</p>
<p>systemd Dbus APIs are in use in the following modules:</p>
<ul>
<li>machines</li>
<li><code>ManagerProxy::list_machines()</code></li>
<li>Can do most other calls then on the machine's systemd/dbus</li>
<li>networkd</li>
<li><code>ManagerProxy::list_links()</code></li>
<li>Interface state files at <code>/run/systemd/netif/links</code> are used by default; the varlink
<code>io.systemd.Network.Describe</code> API can be enabled instead (see below)</li>
<li>system</li>
<li><code>ManagerProxy::get_version()</code></li>
<li><code>ManagerProxy::system_state()</code></li>
<li>timer</li>
<li><code>TimerProxy::unit()</code> - Find service unit of timer</li>
<li><code>ManagerProxy::get_unit()</code></li>
<li><code>UnitProxy::state_change_timestamp()</code></li>
<li><code>UnitProxy::state_change_timestamp_monotonic()</code></li>
<li>units</li>
<li><code>ManagerProxy::list_units()</code> - Main counting of unit stats</li>
<li><code>ServiceProxy::cpuusage_nsec()</code></li>
<li><code>ServiceProxy::ioread_bytes()</code></li>
<li><code>ServiceProxy::ioread_operations()</code></li>
<li><code>ServiceProxy::memory_current()</code></li>
<li><code>ServiceProxy::memory_available()</code></li>
<li><code>ServiceProxy::nrestarts()</code></li>
<li><code>ServiceProxy::get_processes()</code></li>
<li><code>ServiceProxy::restart_usec()</code></li>
<li><code>ServiceProxy::status_errno()</code></li>
<li><code>ServiceProxy::tasks_current()</code></li>
<li><code>ServiceProxy::timeout_clean_usec()</code></li>
<li><code>ServiceProxy::watchdog_usec()</code></li>
<li><code>UnitProxy::active_enter_timestamp</code></li>
<li><code>UnitProxy::active_exit_timestamp</code></li>
<li><code>UnitProxy::inactive_exit_timestamp()</code></li>
<li><code>UnitProxy::state_change_timestamp()</code> - Used for raw stat + time_in_state</li>
</ul>
<p>Some of these modules can be disabled via configuration. Due to this, monitord might not
always be running / calling all these DBus calls per run.</p>
<h2>Varlink</h2>
<p>monitord supports collecting unit statistics via systemd's <a href="https://github.com/systemd/systemd/pull/39202">Varlink metrics API</a>,
available in systemd v260+. When enabled, monitord connects to the <code>io.systemd.Metrics</code> interface
at <code>/run/systemd/report/io.systemd.Manager</code> to collect unit counts, active/load states, and restart counts.</p>
<h3>Enabling Varlink</h3>
<p>Set <code>enabled = true</code> in the <code>[varlink]</code> section of <code>monitord.conf</code>:</p>
<pre><code class="language-ini">[varlink]
enabled = true
</code></pre>
<p>When varlink is enabled, monitord will attempt to collect stats via the varlink APIs first,
automatically falling back to D-Bus or file-based collection when a varlink socket is unavailable
(e.g., older systemd versions).</p>
<h3>Metrics collected via Varlink</h3>
<p><strong>Units</strong> (<code>io.systemd.Metrics</code> — systemd v260+):
- Unit counts by type (service, mount, socket, target, device, automount, timer, path, slice, scope)
- Unit counts by state (active, failed, inactive)
- Per-unit active state and load state (with allowlist/blocklist filtering)
- Per-unit health status (computed from active + load state)
- Per-service restart counts (<code>nrestarts</code>)
- Falls back to D-Bus collection if the socket is unavailable</p>
<p><strong>Networkd interfaces</strong> (<code>io.systemd.Network.Describe</code> — systemd v257+):
- Per-interface operational, carrier, admin, and address states
- Falls back to parsing <code>/run/systemd/netif/links</code> state files if the socket is unavailable</p>
<h3>Containers</h3>
<p>For systemd-nspawn containers, monitord connects to the container's varlink socket via
<code>/proc/<leader_pid>/root/run/systemd/report/io.systemd.Manager</code>, similar to how D-Bus uses
the container-scoped bus socket. Networkd stats use
<code>/proc/<leader_pid>/root/run/systemd/netif/io.systemd.Network</code>, with the same file-based fallback.</p>
<h3>varlink 101</h3>
<p>varlink might one day replace our DBUS usage. Here are some notes on how to work with systemd varlink
as there isn't really documentation outside <code>man</code> pages.</p>
<h4>Checking interfaces</h4>
<ul>
<li>varlinkctl is your friend - https://man7.org/linux/man-pages/man1/varlinkctl.1.html</li>
</ul>
<p>Here is an example with networkd's interfaces:</p>
<pre><code>varlinkctl info unix:/run/systemd/netif/io.systemd.Network
varlinkctl introspect unix:/run/systemd/netif/io.systemd.Network io.systemd.Network
cooper@au:~$ varlinkctl call unix:/run/systemd/netif/io.systemd.Network io.systemd.Network.GetStates '{}' -j | jq
{
"AddressState": "routable",
"IPv4AddressState": "routable",
"IPv6AddressState": "routable",
"CarrierState": "carrier",
"OnlineState": "online",
"OperationalState": "routable"
}
</code></pre>
</div>
<footer>
<small><a href="https://github.com/cooperlees/monitord">Website source on GitHub</a></small>
</footer>
</div>
</body>
</html>