linprov 0.2.21

eBPF mark-of-the-web for Linux: tag network-touched files and enforce who can exec them.
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
# linprov

eBPF-based mark-of-the-web for Linux. Every file written by a process that
touched the network gets tagged with a provenance xattr; every `execve` of
a tagged file is logged, and — optionally — blocked unless the path is on
an explicit allowlist.

## Quickstart

Needs a Linux **6.5+** kernel with BPF LSM enabled and vmlinux BTF — see
[Requirements](#requirements) to turn BPF LSM on if it isn't.

```sh
cat /sys/kernel/security/lsm     # 1. check deps — must contain `bpf`

cargo install bpf-linker         # 2. install (build dep + the daemon;
cargo install linprov            #    pulls the aya-friday-* fork automatically)
sudo $(which linprov) setup      # 3. setup — interactive; installs to /usr/local/bin

sudo linprov run --mode soak     # 4. soak — use the machine normally, ^C when done
$EDITOR /etc/linprov/list.allow  # 5. review the learned rules

# 6. enforce — set mode = "enforce" in /etc/linprov/config.toml, then:
sudo systemctl enable --now linprov.service
```

`setup` is interactive on a terminal (guided soak → enforce, plus an optional
desktop [tray UI](#tray-agent--linprov-notify-swaywayland)). The sections below
cover each step, the [allowlist format](#allowlist-format), and approvals
([`linprov allow`](#approving-a-blocked-exec--linprov-allow)) in full.

## How it works

Three sleepable BPF LSM hooks plus one cleanup tracepoint:

| Hook | What it does |
|---|---|
| `socket_connect` | When a PID `connect()`s to a non-loopback `AF_INET`/`AF_INET6` address, mark the PID as network-touched in an LRU hash map. Loopback connects (`127.0.0.0/8`, `::1`) are skipped by default — pass `--mark-localhost` or `LINPROV_MARK_LOCALHOST=1` to include them (e.g. for the smoke tests, which use a local HTTP server). |
| `file_open` | **Write:** if the opener is a *mark source* and the file is opened for write, write the OriginRecord into a `BPF_MAP_TYPE_INODE_STORAGE` map keyed on the file's inode and emit a ringbuf event. A mark source is either a network-touched PID (fresh record, the opener is the creator) or a *taint-propagating* PID (inherits the source file's record). **Read:** look the inode up in INODE_MARKS; on a miss, fall back to the `bpf_get_file_xattr` kfunc and *promote* the result back into INODE_MARKS (so the kfunc cost is paid once per inode per boot). If the file is marked, taint the opener (`PROP_PIDS`) carrying that record — so files it later writes inherit the mark (this is how `tar`/`unzip`/`cp` propagate provenance). And if the opener's `comm` is a known **interpreter** (`bash`, `python`, …; the `INTERPRETERS` map) that hasn't yet been cleared (`APPROVED_INTERP`), this read is a *script being loaded for execution* — run the same allowlist check `bprm_check_security` uses against the script's path and, in enforce mode, return `-EPERM` when not permitted. On success the interpreter PID is approved so its later marked reads (the script's own data files) pass unchecked, like an allowlisted ELF. This closes `bash foo.sh` / `python foo.py` / `. foo.sh` (the interpreter is unmarked, so the script never reaches the execve hook); shebang `./foo.sh` was already covered by `bprm_check_security`. |
| `bprm_check_security` | On every exec, look the inode up in INODE_MARKS first; if absent, fall back to the `bpf_get_file_xattr` kfunc. If either source has the mark, emit a ringbuf event — and in enforce mode, return `-EPERM` for paths not on the allowlist. |
| `sched_process_exit` (tracepoint) | Reap the network-touched and taint-propagating PID entries (`NET_PIDS`, `PROP_PIDS`) on task teardown. |

Userspace consumes the ringbuf, applies the `security.bpf.linprov.origin`
xattr (the kernel restricts `bpf_set_dentry_xattr` to LSM hooks that
natively take a trusted dentry, which `file_open` isn't), and — in
enforce mode — seeds an in-kernel hash map of permitted paths. It also
back-fills the augmented record (with the resolved creator exe-path hash)
into INODE_MARKS, since the in-kernel `file_open` copy can't resolve the
creator path itself — this lets the `bprm` fast path skip the xattr kfunc
and lets read-taint propagation inherit the full creator identity.

The two mark sources play different roles:

- **INODE_MARKS** is the same-boot fast path. Synchronous in `file_open`,
  so by the time the very next `execve` runs, the mark is already
  visible to `bprm_check_security`. Closes the race window where a freshly
  downloaded binary could exec before userspace landed the xattr.
- **The xattr** is the durability layer. Survives daemon restart,
  reboots, and inode cache eviction. Written off-band by userspace; read
  in-kernel as fallback.

Either source produces the same OriginRecord — enforcement / logging
doesn't care which fired.

## Requirements

- Linux **6.5+** kernel with BPF LSM enabled (`CONFIG_BPF_LSM=y`, `bpf` in
  the active `lsm=` boot parameter). Confirm with:
  ```
  cat /sys/kernel/security/lsm  # must contain `bpf`
  ```
  On Pop!_OS / Ubuntu with systemd-boot:
  ```
  sudo kernelstub -a "lsm=$(cat /sys/kernel/security/lsm),bpf"
  # then reboot
  ```
- vmlinux BTF (`/sys/kernel/btf/vmlinux`) — needed for LSM hook resolution.
- Rust nightly (pinned via `rust-toolchain.toml`).
- The userspace daemon runs as **root** (BPF program load + LSM attach +
  `security.bpf.*` xattr writes all need it).

## Install

```
cargo install bpf-linker
cargo install linprov
sudo $(which linprov) setup
```

`cargo install` drops the binary in `~/.cargo/bin/`, which isn't on
root's `secure_path` — that's why the first invocation needs the
absolute path. `linprov setup` immediately copies itself to
`/usr/local/bin/linprov`, so every later `sudo linprov ...` (and
`linprov upgrade`) resolves without help. Uses an aya fork published
as `aya-friday-*` on crates.io — pulled in automatically as a regular
dependency.

## Build from source

```
cargo build --release
```

## Tests

```
# Unit tests + doctests (no kernel needed):
cargo test --workspace

# Smoke suite (needs root + BPF LSM kernel; see tests/README.md):
cargo build
sudo ./tests/smoke/run-all.sh
```

## Run

`linprov` is structured as three subcommands. The recommended
end-to-end flow is **setup → soak → review → enforce**.

### 1. `linprov setup`

Feature-checks the kernel (≥ 6.5, `bpf` in active `lsm=`, `vmlinux`
BTF), copies the running binary to `/usr/local/bin/linprov`, writes a
commented `/etc/linprov/config.toml`, an empty allowlist at
`/etc/linprov/list.allow`, and a systemd unit (writes only — doesn't
enable). The config it writes starts in `mode = "observe"`; don't
enable the unit yet.

```
sudo $(which linprov) setup    # first time only; sudo can't find ~/.cargo/bin
```

After this, the binary's at `/usr/local/bin/linprov` (on root's
`secure_path`), so `sudo linprov ...` works from anywhere.

On a terminal, `setup` then walks you through the rest interactively: it
explains the observe → soak → enforce flow, and — if it detects a graphical
session — offers to set up the **desktop tray UI** for you (enable
`notifications = "tray"`, add you to the `linprov` group, and install a
`systemd --user` service that autostarts `linprov notify`). Every change is
gated on a y/n prompt; decline any and it just prints the command instead. It
finishes by offering to drop you straight into a soak. Pass `--yes`/`-y` (or
pipe/redirect stdin, as CI does) to skip the walkthrough and get the classic
write-files-and-print-next-steps behavior.

### 2. Soak interactively to build an allowlist

Run the daemon in the foreground while you use your machine
normally. Every marked `execve` appends one rule to the allowlist
file. `^C` when you've covered enough — the rules persist on disk.

```
sudo linprov run --mode soak
journalctl is not involved here; logs stream to your terminal.
```

The `--mode soak` flag overrides the config's `mode = "observe"`;
the rest of the config (allowlist path, soak dims, etc.) is still
honored. Watch the file grow:

```
tail -f /etc/linprov/list.allow
```

### 3. Review the allowlist

Trim anything you didn't actually want permitted:

```
cat /etc/linprov/list.allow
$EDITOR /etc/linprov/list.allow
```

### 4. Flip to enforce and start the unit

Edit `/etc/linprov/config.toml` and change `mode = "observe"` to
`mode = "enforce"`. Then enable the systemd unit:

```
sudo systemctl daemon-reload
sudo systemctl enable --now linprov.service
journalctl -u linprov.service -f
```

A marked execve that doesn't match any rule now gets blocked with
`-EPERM` from `security_bprm_check` — the shell sees
`Operation not permitted` and `$?` is `126`.

### `linprov upgrade`

After `cargo install --force linprov` drops a new binary in
`~/.cargo/bin/`:

```
sudo linprov upgrade
```

The running binary is `/usr/local/bin/linprov` (an *old* version);
`upgrade` resolves your `~/.cargo/bin/linprov` automatically — via
`$SUDO_USER` / `$DOAS_USER` / `$PKEXEC_UID` / `logname` / euid's home,
falling back to a unique-match scan of `/etc/passwd` — then copies it
over `/usr/local/bin/linprov` and runs `systemctl daemon-reload` +
`systemctl restart linprov.service`. If autodetect fails (multi-user
host, weird shell setup), point it explicitly: `sudo linprov upgrade
--source /path/to/new/linprov`.

If the source already matches the install path byte-for-byte,
`upgrade` reports it and skips the restart instead of bouncing the
daemon for nothing.

### `linprov run` reference

Reads `/etc/linprov/config.toml` by default; CLI flags + env vars
override. The systemd unit calls `linprov run --config
/etc/linprov/config.toml`. Three modes:

- **observe** (default): mark files, log marked execs, never block.
- **soak**: like observe plus appending one allowlist rule per
  PROVENANCE-EXEC. `--soak creator_process,creator_uid,…` (also
  settable as `soak = [...]` in the config) controls which dims each
  emitted rule AND-joins.
- **enforce**: block any marked execve whose origin doesn't match a
  rule.

Enforcement also covers **interpreter-invoked scripts** (`bash foo.sh`,
`python foo.py`, `. foo.sh`), not just shebang execs: a known
interpreter reading a marked file is checked against the allowlist by
the script's path, so a rule like `target_filename=/x/script.py` or
`target_folder=/x/` permits both the interpreter and shebang forms
alike. The interpreter set is configurable — `--interpreters bash,sh,…`
(or `interpreters = [...]` in the config); it defaults to the common
shells / runtimes (bash, sh, python, perl, node, …). Pass an empty value
(`--interpreters ''` or `interpreters = []`) to disable script
enforcement. Blocked / observed scripts log as `BLOCKED-SCRIPT` /
`PROVENANCE-SCRIPT`, surfacing the script (not the interpreter) as the
unit. The check fires only on the *first* marked read per interpreter
invocation — the script — so an allowlisted script may then open its own
marked data files freely (just like an allowlisted ELF). An interpreter
reading a marked file *without* having been cleared to run a script
(interactive use, a local script reading downloaded data) is still
denied — allowlist it or narrow the interpreter set if that's undesirable.

Send the daemon **`SIGHUP`** to reload the allowlist file and re-seed the
in-kernel rules live — no restart, no re-attach (`sudo systemctl reload
linprov` or `sudo pkill -HUP -x linprov`). Edit `list.allow`, SIGHUP, and
the new rules enforce on the next exec. A reload whose file can't be read
is rejected with a warning and the running rules stay in force; an
over-capacity file (more than `MAX_RULES`, currently 8192) loads the first
`MAX_RULES` and warns rather than failing — so neither a bad edit nor a
long soak run that outgrew the ceiling can crash the daemon. Only the
allowlist reloads — mode and other config need a restart.

By default logs go to stderr (journald captures them under
systemd). Set `log_file = "/path/to/file"` in the config (or
`--log-file`) to append-log to a file instead — handy for non-systemd
setups.

Sample log lines for observe / enforce:

```
PROVENANCE-EXEC target=/usr/local/bin/foo landing=/tmp/foo pid=12345 \
  comm=zsh origin={v:3,…,comm:curl,path:/usr/bin/curl}
BLOCKED-EXEC target=/tmp/sketchy landing=/tmp/sketchy pid=12346 comm=zsh \
  origin={v:3,…,comm:curl,path:/usr/bin/curl} (LSM verdict -1) [allow: 9f3a1c07]
```

### Approving a blocked exec — `linprov allow`

Each `BLOCKED-EXEC` / `BLOCKED-SCRIPT` line ends with `[allow: <token>]` —
a short stable handle for the most-specific rule that would have permitted
that exec. To permit it without hand-editing the allowlist:

```
sudo linprov allow 9f3a1c07          # append the rule to list.allow (permanent) + apply live
sudo linprov allow --once 9f3a1c07   # apply in memory only — not written to the file
```

`allow` talks to the running daemon over a root-only control socket
(`/run/linprov/control.sock`) and re-seeds the in-kernel rules immediately —
no restart. `--once` rules live in the daemon's memory: active right away
and across `SIGHUP` reloads, but never persisted, so they vanish on daemon
restart (handy for a one-off you don't want to whitelist forever). Tokens
are per-daemon-session — if the daemon restarted since the block, re-run the
command to get a fresh token. `allow` needs the daemon running (and root);
with it down, edit `list.allow` and `SIGHUP` instead.

### Tray agent — `linprov notify` (sway/Wayland)

For a graphical workflow, `linprov notify` is a user-session tray agent: it
shows a StatusNotifierItem icon whose menu lists recent blocked execs, each
with **Allow once / Allow always / Close**, and fires a passive desktop
notification as an alert. Menu clicks drive the same control-socket verbs as
[`linprov allow`](#approving-a-blocked-exec--linprov-allow) — Allow once →
`once <token>` (in-memory), Allow always → `allow <token>` (persisted).

Because the daemon is root (system bus) and the tray is on your session bus,
the agent runs as your user and reaches the daemon through a group-readable
socket.

**Easiest setup: run `linprov setup` on the desktop and accept the tray
prompts.** It enables `notifications = "tray"`, adds you to the `linprov`
group, and installs + enables the `systemd --user` service below.

To wire it up by hand:

```bash
# 1. daemon side: expose the socket to the group
sudo groupadd -f linprov                 # `linprov setup` also does this
# set `notifications = "tray"` in /etc/linprov/config.toml, then:
sudo systemctl restart linprov           # `notifications` is launch config (not SIGHUP-reloadable)

# 2. join the group (re-login, or `newgrp linprov`, for it to take effect)
sudo usermod -aG linprov "$USER"

# 3. autostart the agent with a systemd --user service
mkdir -p ~/.config/systemd/user
cat > ~/.config/systemd/user/linprov-notify.service <<'EOF'
[Unit]
Description=linprov desktop tray agent
PartOf=graphical-session.target
After=graphical-session.target
[Service]
ExecStart=/usr/local/bin/linprov notify
Restart=on-failure
RestartSec=2
[Install]
WantedBy=graphical-session.target
EOF
systemctl --user daemon-reload
systemctl --user enable --now linprov-notify.service
```

The service tracks `graphical-session.target`, so it starts and stops with your
desktop session — most desktops activate that target; a bare sway session may
need it wired (or just `exec linprov notify` in your sway config instead — the
agent retries tray registration, so it survives launching before waybar's tray
is up).

Requires a StatusNotifierHost — on sway, enable waybar's `tray` module.
Enforcement is synchronous, so the prompt is post-hoc: the blocked exec
already failed; allowing it permits the **re-run**. Anyone in the `linprov`
group can approve execs, so only add trusted users.

## Allowlist format

One rule per line. `#` starts a comment; blank lines are ignored. Each
line is one rule whose `<dim>=<value>;<dim>=<value>` conditions **AND**
together. Multiple lines **OR**: a marked execve is permitted if any
single rule's conditions all match.

```
# uid 1000 downloading with curl is fine, anywhere
creator_uid=1000;creator_comm=curl

# uid 1000 may exec firefox-dropped binaries that ended up in ~/.local/bin
execution_uid=1000;creator_comm=firefox;target_folder=/home/user/.local/bin
```

| dim | example | matches if … |
|---|---|---|
| `target_filename` | `/usr/bin/foo` | the executed binary's full path equals this |
| `target_folder` | `/opt/my-app/` | the executed binary is **directly in** this folder (exact). Add a trailing `*` (`/opt/my-app/*`) to also match **any descendant** (`/opt/my-app/bin/foo`, `/opt/my-app/plugins/bin/bar`, …) |
| `landing_filename` | `installer.sh` | the **basename** of the file's download path equals this |
| `landing_folder` | `/home/user/Downloads/` | the file's download folder is exactly this; with a trailing `*` (`/home/user/Downloads/*`), this or **any ancestor** of the download folder (up to 32 levels) |
| `creator_process` | `/usr/bin/curl` | the full `exe` path of the writer matches |
| `creator_comm` | `curl` | the 16-byte `comm` of the writer matches |
| `creator_uid` | `1000` | the writer's UID matches |
| `execution_uid` | `0` | the UID running the `execve` matches |

`target_*` reflects the file's location at execve time; `landing_*` is
where it was first written. They diverge when the file is moved between
download and execve — e.g. `curl -o /tmp/foo http://…; mv /tmp/foo
~/.local/bin/foo; ~/.local/bin/foo` has `landing_folder=/tmp/`,
`landing_filename=foo`, and `target_filename=/home/user/.local/bin/foo`.

Folder rules must end in `/` (userspace normalizes). Folder matching is
**exact by default** — `target_folder=/opt/app/` permits `/opt/app/x`
but not `/opt/app/bin/x`. A trailing `*` makes it **recursive**:
`target_folder=/opt/app/*` permits any depth below `/opt/app/`. Same for
`landing_folder`. `soak` emits exact rules (least privilege); add the
`*` yourself to broaden. (Only a trailing `*` is supported — no
mid-path wildcards or regexes, since matching walks `/`-delimited hash
prefixes.)

### Path length and matching model

There is **no length limit** on any path-shaped rule value — they're all
stored as FNV-1a-64 hashes in a fixed 320-byte record, so a 4096-byte
path hashes to the same 8 bytes as a short one. What differs is *how far
the recursive (`*`) form can nest*, because the exec-time path is
available live at the gate while the landing path is only present (as
hashes) in the stored record:

- **`target_folder*`** walks the live exec path at the gate, so it
  matches nested at **any depth**, for paths up to `PATH_MAX` (4096).
- **`landing_folder*`** matches against up to 32 ancestor-folder hashes
  recorded in the file's mark — so its nesting is bounded to 32 *levels*
  (not bytes), far deeper than any real download path.
- Exact (no `*`) forms are a single hash compare, any length.
- **`landing_filename`** is the basename only; **`target_filename`** is
  the full exec path.

Up to 32 rules per allowlist (BPF verifier budget — bump `MAX_RULES` and
rebuild for more).

`creator_process` is populated by userspace via `readlink /proc/$pid/exe`
when handling the file-open event. If the creator process exits before
userspace lands the augmented xattr, rules requiring `creator_process`
won't match for that file — use `creator_comm` (always populated by BPF)
as the fallback dim.

### The audit db: resolving hashes back to paths

Because the record stores hashes, not strings, the daemon keeps a
plaintext, append-only map of every hash it stores → the path it came
from, at `/var/lib/linprov/hashes.tsv` (configurable via `hash_db` /
`--hash-db`):

```
$ grep Downloads /var/lib/linprov/hashes.tsv
7ba1f0cc8598e793	/home/user/Downloads/
```

This is what lets the daemon log readable paths, lets `soak` emit
plaintext rules, and lets you audit what's been marked with `grep`. It
**persists across reboots**, so resolution still works for files marked
in a previous boot. Enforcement never consults it — the BPF program
matches on hashes alone, so losing or pruning the db costs only human
readability, never correctness.

## Inspecting the xattr by hand

```
getfattr -d -m '.*' /path/to/file
# security.bpf.linprov.origin=0sBAAAAA...
```

The value is the binary `OriginRecord` (v4 layout, 320 bytes):

```
version u32 | pid u32 | ts_boot_ns u64 | comm[16] | creator_uid u32 |
_pad u32 | creator_path_hash u64 | landing_folder_hash u64 |
landing_basename_hash u64 | landing_ancestor_hashes[32]
```

The daemon's log lines already format it, resolving the hashes via the
audit db. Earlier-version xattrs from prior linprov builds are ignored
(treated as unmarked), so files re-mark on next open after an upgrade.

## Roadmap

See [`ROADMAP.md`](ROADMAP.md).

## Repository layout

```
linprov/         userspace daemon (clap, tokio, aya)
linprov-ebpf/    BPF programs (no_std, aya-ebpf, inline asm for kfuncs)
linprov-common/  types shared between the two
tests/smoke/     end-to-end tests against a real kernel
.github/         CI workflows
```

## License

Userspace crates: dual-licensed under MIT or Apache-2.0 at your option.
See `LICENSE-MIT` and `LICENSE-APACHE` at the repo root.

The BPF program (`linprov-ebpf`) declares `Dual MIT/GPL` in its
`license` ELF section so the kernel verifier accepts it as
GPL-compatible — required for the `bpf_d_path` and `bpf_get_file_xattr`
helpers, which are `gpl_only`. The source itself is the same
MIT-OR-Apache-2.0 as the rest of the workspace; the GPL token is a
license-compatibility statement to the kernel, not a relicense.