systemd-journal-sdk-engine 0.7.2

Async query engine components for the pure Rust systemd journal SDK
Documentation
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
# Rust Journal SDK

This workspace contains pure-Rust systemd journal reader and writer components.
It does not link to libsystemd or other system journal libraries for SDK
behavior.

## crates.io Usage

The public Rust SDK package is `systemd-journal-sdk`. Use a Cargo dependency
alias if existing code should import it as `journal`:

```toml
[dependencies]
journal = { package = "systemd-journal-sdk", version = "0.7.2" }
```

The workspace also publishes project-prefixed lower-level packages for
consumers that need direct access to the same internal layers used by the SDK:

- `systemd-journal-sdk-common`
- `systemd-journal-sdk-core`
- `systemd-journal-sdk-registry`
- `systemd-journal-sdk-log-writer`
- `systemd-journal-sdk-index`
- `systemd-journal-sdk-engine`

Current writer scope:

- regular journal files by default and compact journal files with
  `JournalFileOptions::with_compact(true)` or `Config::with_compact(true)`;
- uncompressed DATA objects by default;
- optional zstd, xz, and lz4-compressed DATA object writing through
  `JournalFileOptions` and `journal::Config`, using systemd's 512-byte default
  threshold and 8-byte minimum clamp;
- keyed hash tables using the journal file ID;
- byte-safe field values through `&[u8]` field payloads;
- direct-file writing through `journal_core`;
- high-level directory writing through `journal::Log`;
- systemd-compatible `0640` journal file permissions by default, configurable
  for newly-created files through `JournalFileOptions::with_file_mode()` and
  `Config::with_file_mode()`;
- chain active naming by default, with
  `Config::with_strict_systemd_naming(true)` available for strict systemd
  `<source>.journal` active naming;
- shared field-name policy layers for direct-file and directory writers:
  default `FieldNamePolicy::Journald`, app-facing
  `FieldNamePolicy::JournalApp`, and structure-level `FieldNamePolicy::Raw`;
- entry-count, file-size, and duration rotation;
- tracked journal-file-count, byte-size, and age retention;
- optional pure cross-SDK cooperative lockfile with stale-owner detection when
  callers explicitly acquire `journal_core::file::lock::WriterLock`;
- Forward Secure Sealing TAG writing through `SealOptions`, including stock
  `journalctl --verify --verify-key` coverage for sealed files generated by
  this writer;
- FSS `SealOptions::start_usec` normalization to systemd's verification-key
  epoch boundary, so unaligned source timestamps still produce sealed files that
  stock `journalctl --verify --verify-key` can validate;
- low-level `EntryWriteOptions::seqnum(...)` and
  `EntryWriteOptions::boot_id(...)` exact-regeneration support for preserving
  ENTRY sequence gaps and per-entry boot IDs when rewriting existing journal
  files. Leave them unset for normal auto-incrementing sequence numbers and the
  writer-wide boot ID;
- native systemd writers do not participate in the SDK lock protocol and remain
  an operational exclusion;
- live stock-reader validation for the current writer slice with `journalctl
  --file`, `journalctl --file --follow --no-tail --boot=all`, and libsystemd
  reader APIs, including live sequence-order checks;
- configurable explicit live-reader publication cadence through
  `JournalWriter::set_live_publish_every_entries()` and
  `Config::with_live_publish_every_entries()`, defaulting to systemd-compatible
  publication after every entry.

Deferred scope:

- appending to arbitrary historical or systemd-created journal variants. In
  particular, append-open on historical unkeyed-hash files is unsupported and
  returns a controlled error before entry mutation;
- the imported legacy `jf` `journal_file::JournalWriter` remains available for
  compatibility with that crate's public surface, but it is not the supported
  production writer path. It also returns a controlled unsupported-file error
  for unkeyed append targets instead of panicking. New writer integrations
  should use `journal_core::file::JournalWriter` or the high-level
  `journal::Log` directory writer;
- full systemd object-graph verification parity beyond the current repository
  verification API.

Current reader scope:

- regular and compact journal files;
- `.journal`, `.journal~`, `.journal.zst`, and `.journal~.zst` files;
- zstd-compressed fixture files;
- zstd, lz4, and xz-compressed DATA objects through pure-Rust dependencies;
- directory reading across active and archived files with bounded recursive
  traversal, symlink-cycle protection, and interleaved multi-file ordering,
  including mixed regular/compact, compressed/uncompressed,
  sealed/unsealed, and whole-file `.journal.zst` files in one directory;
- forward/backward iteration, cursors, realtime and monotonic timestamps,
  seqnum metadata, field enumeration, binary field values, repeated field
  values, stateful current-entry data enumeration, unique value enumeration,
  and export/json/text formatting;
- byte-preserving RAW field-name access through `Entry::raw_fields()`,
  `Entry::get_raw()`, and `Entry::get_raw_values()`;
  `Entry.fields` and `Entry.field_values` are UTF-8 string-keyed convenience
  maps and do not synthesize lossy names for non-UTF8 RAW field names;
- export byte output preserves non-UTF8 RAW field names; JSON output, field
  enumeration, unique queries, and `get_data` facade helpers remain UTF-8
  field-name surfaces;
- libsystemd-compatible facade functions for open file/directory/files, close,
  seek head/tail/realtime/cursor, next/previous/skip, match groups,
  current-entry data enumeration, field enumeration, unique value enumeration,
  realtime/monotonic/seqnum/cursor metadata, and boot listing;
- facade cursor seeking follows libsystemd semantics: valid missing cursors are
  accepted as seek locations, while `test_cursor` checks exact current position;
- current-entry facade data enumeration returns borrowed `FIELD=value` bytes
  for the current DATA object, matching libsystemd-style validity until the
  current row is reset or the reader advances; uncompressed DATA is returned
  directly from the mmap-backed journal payload, while compressed DATA is copied
  into row-owned stable storage so later compressed DATA reads cannot invalidate
  earlier pointers from the same row;
- direct facade unique queries return language-native `(field, value)` pairs;
  stateful unique enumeration returns full binary-safe `FIELD=value` payloads;
- `FileReader::visit_unique_values()` and
  `DirectoryReader::visit_unique_values()` stream indexed unique values without
  first materializing the full result set;
- `FileReader::explore()` provides an optimized single-file query surface for
  log-explorer workloads: exact indexed filters, selected facet counters,
  optional histogram, optional FTS, optional returned rows, and query counters.
  It lazily classifies reusable DATA objects by DATA offset during candidate-row
  traversal, groups facets that share the same effective filter set into one
  traversal pass, and expands all fields only for returned rows.
  `ExplorerAnchor::Auto` is the default: forward queries start from the lower
  time bound or file head, while backward queries start from the upper time
  bound or file tail. `ExplorerFieldMode::FirstValue` is the default explorer
  accounting mode: one selected facet/histogram/source field contributes at
  most one value per row, so traversal may stop after all required fields are
  found. `ExplorerFieldMode::AllValues` is available when a caller needs exact
  duplicate-value accounting and accepts the slower full-row scan. `explore()`
  owns the reader position and replaces the reader match state while it runs;
  callers should explicitly seek and reapply any manual matches before
  continuing normal iteration after an explorer query;
- `FileReader::explore_with_strategy()` exposes explicit strategy selection.
  `ExplorerStrategy::Traversal` is the default behavior used by `explore()`.
  `ExplorerStrategy::Index` derives all-values facet and histogram counts from
  FIELD/DATA indexes and DATA entry posting lists. It rejects default
  first-value semantics, FTS, and source-realtime-bounded queries instead of
  returning approximate results. `ExplorerStrategy::Compare` runs traversal and
  index, fails if their logical outputs differ, and returns timing/counter
  diagnostics in `ExplorerResult::comparison`. There is no automatic planner
  because index aggregation is faster only for some query shapes;
- `journal::netdata` provides the Netdata-specific Rust function boundary over
  the explorer. It is the SDK API intended to replace Netdata's generic
  `systemd-journal.plugin` logs function. `NetdataJournalFunction::systemd_journal()`
  runs a `systemd-journal` request JSON against a journal directory and returns
  Netdata-shaped function JSON. This layer owns Netdata request parsing,
  default facets, default display columns, histogram defaults, field
  presentation transforms, row options, and zero-count vocabulary padding for
  filtered requests. The default profile keeps UID/GID values as raw journal
  data and does not resolve host user or group names. The separate
  `NetdataJournalFunction::systemd_journal_plugin_compatible()` constructor
  opts into host user/group name presentation to emulate Netdata's installed
  plugin, with per-query UID/GID display caching so repeated values do not
  repeatedly call host name-service lookups. This layer is intentionally
  separate from the core journal file-format reader. Consumers that need
  Netdata function control can use
  `run_directory_request_json_with_options()` or
  `run_directory_request_bytes_with_options()` with
  `NetdataFunctionRunOptions` to supply a timeout, progress callback,
  cancellation callback, and optional caller-owned `NetdataFunctionState`.
  Progress is reported against the files selected for the query after source
  and time-window preselection, including file-end progress for small or fast
  files. Cancellation is checked before each selected file, during active
  Explorer scans, and after file-end progress callbacks. The optional state hook
  lets Netdata pass registry-provided source type/name metadata and persist
  per-file learned
  journal-vs-source-realtime drift. Without state, the wrapper falls back to
  journal headers and plugin-compatible filename classification for built-in
  `__logs_sources` groups. `NetdataFunctionConfig::source_selector_name` and
  `source_selector_help` customize only the displayed selector label/help
  while preserving the `__logs_sources` wire id. Sampling uses
  plugin-compatible sampled, unsampled, and estimated counters for
  full-analysis sliced requests and is disabled for data-only requests. The
  `query` request member uses Netdata
  `SIMPLE_PATTERN` behavior: ordered `|` terms, leading `!` negative terms,
  escaped separators, substring `*` parts, and case-insensitive matching.
  The SDK Netdata boundary always executes indexed slice semantics. The `slice`
  request member is retained in the
  normalized echo because it is part of the plugin request shape; it does not
  select a slower non-slice fallback path.
  Cancellation and no-change responses use Netdata's compact function error
  envelope; timeout returns a partial table response;
- `src/internal/testcmd/netdata_function_wrapper` is a thin offline test adapter
  over the SDK Netdata boundary. It exposes the same CLI shape as Netdata's
  plugin test path:
  `netdata_function_wrapper --test systemd-journal --dir <journal-dir>
  --timeout <seconds> < <request.json>`. The request JSON is read from stdin
  to avoid privileged file reads in test binaries. The comparison tools under
  `../tests/netdata_function/` compare semantic function output against an
  external `systemd-journal.plugin` binary. The wrapper has diagnostic-only
  `--progress-jsonl`, `--cancel-immediately`, and `--cancel-after-progress`
  switches to validate the SDK run-control API; production consumers should
  call `journal::netdata` directly and wire callbacks to their own function
  framework;
- default reader options use live/windowed mmap with a 32 MiB window. Smaller
  windows are available for constrained environments, but high-cardinality
  indexed queries can become remap-bound with very small windows;
- `--output export` uses systemd's size-prefixed binary field encoding and
  blank-line entry separator;
- JSON output includes realtime and monotonic timestamps, preserves valid UTF-8
  strings, and encodes binary values as arrays of unsigned bytes;
- libsystemd-style match behavior: AND between different fields, OR between
  values for the same field, `SdJournalAddDisjunction()` for `+`, and
  `SdJournalAddConjunction()` for explicit AND groups;
- a file-backed `journalctl` command under `src/cmd/journalctl` with
  `--since`, `--until`, `--boot`, and `--follow` support for repository-backed
  files and directories;
- verification APIs: `journal::verify_file()` for structural verification and
  `journal::verify_file_with_key()` for sealed TAG/HMAC verification;
- a conformance adapter under `src/adapter`.

Platform behavior:

- Linux is the validated reference runtime and keeps mmap-backed hot paths,
  monotonic timestamps, Unix directory sync, and SIGBUS handling.
- FreeBSD and macOS builds use monotonic timestamps and the same pure file
  reader/writer paths. Optional identity and lock helpers are separate from the
  core file-format writer.
- Windows builds use unbiased interrupt time for automatic writer timestamps
  and no-op directory fsync/SIGBUS hooks. Optional identity and lock helpers
  are separate from the core file-format writer.
- Non-Linux build checks are compilation evidence only unless runtime evidence
  from that OS is recorded separately. Files written on non-Linux targets must
  still pass Linux stock `journalctl --verify --file` and repository
  interoperability checks before production compatibility is claimed.

Reader limitations:

- `list_boots` uses file-level boot metadata in this slice;
- full systemd object-graph verification parity is tracked separately;
- daemon-only journalctl operations are not implemented.

Basic directory writer usage:

```rust
use journal::{Config, Log, Origin, RetentionPolicy, RotationPolicy, Source};

let origin = Origin {
    machine_id: None,
    namespace: None,
    source: Source::System,
};
let config = Config::new(
    origin,
    RotationPolicy::default()
        .with_number_of_entries(100000)
        .with_duration_of_journal_file(std::time::Duration::from_secs(3600)),
    RetentionPolicy::default()
        .with_number_of_journal_files(10)
        .with_duration_of_journal_files(std::time::Duration::from_secs(7 * 24 * 3600)),
);
let mut log = Log::new("/var/log/journal-sdk", config)?;

log.write_entry(
    &[
        b"MESSAGE=plugin started".as_slice(),
        b"PRIORITY=6".as_slice(),
        b"SYSLOG_IDENTIFIER=example-plugin".as_slice(),
    ],
    None,
)?;
log.sync()?;
log.close()?;
# Ok::<(), Box<dyn std::error::Error>>(())
```

`Log` stores files below `<directory>/<machine-id>/`. By default the active file
uses the chain filename form
`<source>@<seqnum-id>-<head-seqnum>-<head-realtime>.journal`; call
`Config::with_strict_systemd_naming(true)` to use `<source>.journal` as the
active file.
If strict naming opens a directory with a stale chain-named `ONLINE` active
file, it archives that file before creating `<source>.journal`, so the directory
does not keep parallel active files.
If an existing active file is rejected by the low-level append-open path as
unsupported, `Log` follows journald's reliable-open behavior: it uses readable
header metadata to continue sequence identity where possible, moves the old
active file to a collision-safe `*.journal~` disposed name, and creates a fresh
active file. Direct low-level append-open still returns an unsupported error.
Unset rotation and retention limits are disabled. Retention counts the tracked
active/current file in file-count and committed-byte limits, but deletion only
selects older unprotected files owned by the configured source; the tracked
active/current file is never deleted to satisfy a retention limit. Duration
rotation is checked before append using the incoming entry realtime and the
active file head realtime.
Call `Log::enforce_retention()` to apply age/count/byte retention without
waiting for another append-triggered rotation or close. Call `Log::close()` to
archive the current file and enforce retention; `Drop` only performs best-effort
state persistence.
Retention also runs once when a writer opens or creates the active file:
existing-active reopen and `LogOpenMode::Eager` enforce it during construction,
while lazy archived-only construction defers enforcement until the first append
opens the active file, before the first entry is written.
Use `Config::with_open_mode(LogOpenMode::Eager)` to create/open the active file
during construction, and `Config::with_identity_mode(LogIdentityMode::Strict)`
plus `Origin.machine_id` and `Config::with_boot_id()` to require explicit
identity. `LogIdentityMode::Auto` uses explicit IDs when provided and otherwise
generates SDK-local IDs; it does not read host identity sources.
`Log::configured_directory()`, `Log::journal_directory()`,
`Log::active_path()`, `Log::machine_id()`, `Log::boot_id()`, and
`Log::source()` expose the same directory/identity contract as the other SDKs.
Lifecycle observers receive `Created`, `Rotated`, and `RetainedDeleted` events;
`Log::with_artifact_sizer()` includes per-journal sidecar bytes in retained-size
decisions. `write_entry_with_timestamps()` accepts
`EntryTimestamps::source_realtime_usec` for `_SOURCE_REALTIME_TIMESTAMP`
injection and clamps non-progressing realtime and monotonic overrides forward.
The low-level `JournalWriter::add_entry()` path preserves explicit
caller-provided realtime and monotonic timestamps without clamping or rejecting
them; callers using that raw API are responsible for not producing same-boot
backward monotonic entries unless they are intentionally creating invalid
fixtures. On reopen, `Log` seeds the monotonic clamp floor from a persisted
chain tail only when the tail entry boot ID matches the current writer boot ID.
`Log` is a single-writer object; callers must serialize method calls on one
instance. The journal file contract is one writer per file. Acquire
`journal_core::file::lock::WriterLock` when the caller wants the optional
cooperating-writer lock helper to reject another SDK writer for the same file.
`Config::with_field_name_policy()` selects the high-level writer field-name
layer. The default `FieldNamePolicy::Journald` preserves trusted systemd fields
such as `_HOSTNAME` and `_TRANSPORT`. `FieldNamePolicy::JournalApp` drops caller
fields that journald would reject from untrusted applications and fails only
when no caller fields remain. `FieldNamePolicy::Raw` accepts any non-empty
field name that does not contain `=`, but RAW-mode files are not guaranteed to
be accepted by stock systemd tooling. Producer-specific field transformations
belong outside the SDK.

Journal files are created with systemd journald's `0640` default permissions.
Use `JournalFileOptions::with_file_mode()` for direct-file writers or
`Config::with_file_mode()` for directory writers when a consumer needs another
mode. The override applies only to newly-created files; existing files keep
their current filesystem permissions. POSIX modes remain subject to the
process umask, matching systemd/open semantics. Non-POSIX platforms may ignore
POSIX mode bits.

Live-reader publication can be tuned when the consumer does not need immediate
stock follow-reader wakeups:

```rust
let config = config.with_live_publish_every_entries(64);
```

`1` is the default and publishes after every entry. `0` disables explicit SDK
live publication for poll/snapshot consumers. `N > 1` publishes after every
`N` entries. This is not an `fsync` or durability setting.

Binary-safe values:

```rust
log.write_entry(
    &[
        b"MESSAGE=sample with binary payload".as_slice(),
        b"BINARY_PAYLOAD=\x00\x01\x02\xff".as_slice(),
    ],
    None,
)?;
# Ok::<(), Box<dyn std::error::Error>>(())
```

Basic reader usage:

```rust
use journal::FileReader;

let mut reader = FileReader::open("/path/to/system.journal")?;
reader.add_match(b"PRIORITY=6");
reader.seek_head();

while reader.next()? {
    let entry = reader.get_entry()?;
    if let Some(message) = entry.get_str("MESSAGE") {
        println!("{message}");
    }
}
# Ok::<(), Box<dyn std::error::Error>>(())
```

Optimized single-file explorer usage:

```rust
use journal::{ExplorerQuery, FileReader};

let mut reader = FileReader::open("/path/to/system.journal")?;
let result = reader.explore(&ExplorerQuery {
    facets: vec![b"PRIORITY".to_vec()],
    limit: 0,
    ..ExplorerQuery::default()
})?;

if let Some(priority) = result.facets.get(b"PRIORITY".as_slice()) {
    for (value, count) in priority {
        println!("{} {count}", String::from_utf8_lossy(value));
    }
}
# Ok::<(), Box<dyn std::error::Error>>(())
```

The default first-value mode counts at most one value per selected field per
row. Use `ExplorerFieldMode::AllValues` when a row may contain repeated values
for a selected facet or histogram field and every duplicate value must count.

Explorer column catalogs are built from FIELD indexes. Do not use row traversal
to discover columns in production; a comparison that needs
`debug_collect_column_fields_by_row_traversal` has found a bug in the explorer
or its column-catalog setup, not a valid operating mode.

Specialized callers can select an execution strategy:

```rust
use journal::{ExplorerFieldMode, ExplorerQuery, ExplorerStrategy, FileReader};

let mut reader = FileReader::open("/path/to/system.journal")?;
let result = reader.explore_with_strategy(
    &ExplorerQuery {
        facets: vec![b"PRIORITY".to_vec()],
        field_mode: ExplorerFieldMode::AllValues,
        use_source_realtime: false,
        limit: 0,
        ..ExplorerQuery::default()
    },
    ExplorerStrategy::Index,
)?;
# Ok::<(), Box<dyn std::error::Error>>(())
```

The index strategy is exact for its supported subset, but it is not a universal
speedup. It can be much faster for narrow unfiltered all-values facets and
histograms, and slower for many facets or selective filters. Use
`ExplorerStrategy::Compare` when validating a query shape before relying on the
index strategy; successful compare results include traversal and index timings
and stats in `ExplorerResult::comparison`.

The default `ExplorerAnchor::Auto` chooses the natural scan start for the query
direction. Use explicit `Head`, `Tail`, or `Realtime(usec)` anchors only for
manual paging or when the caller intentionally wants a non-default start point.

For RAW-mode files, use the byte-keyed entry surface when field names are not
guaranteed to be UTF-8:

```rust
if let Some(value) = entry.get_raw(b"\xffRAW") {
    assert_eq!(value, b"raw value");
}

for field in entry.raw_fields() {
    let name_bytes = field.name;
    let value_bytes = field.value;
}
```

File-backed journalctl:

```sh
cargo run --manifest-path rust/Cargo.toml -p journalctl -- \
  --file fixtures/systemd/test-data/no-rtc/system.journal.zst \
  --head 1 \
  --output json
```

Repeated matches for the same field are OR alternatives. Matches for different
fields are ANDed. A separate `+` argument creates an explicit disjunction:

```sh
cargo run --manifest-path rust/Cargo.toml -p journalctl -- \
  --file ./sample.journal \
  PRIORITY=3 PRIORITY=4 + MESSAGE=boot
```

Realtime ranges, boot filters, and follow mode are supported for file-backed
inputs:

```sh
cargo run --manifest-path rust/Cargo.toml -p journalctl -- \
  --directory ./journals --boot=all --since @1700000000 --until @1700003600
cargo run --manifest-path rust/Cargo.toml -p journalctl -- \
  --file ./active.journal --follow --no-tail --boot=all
```