systemd-journal-sdk-engine 0.7.2

# Rust Journal SDK

This workspace contains pure-Rust systemd journal reader and writer components.
It does not link to libsystemd or other system journal libraries for SDK
behavior.

## crates.io Usage

The public Rust SDK package is `systemd-journal-sdk`. Use a Cargo dependency
alias if existing code should import it as `journal`:

```toml
[dependencies]
journal = { package = "systemd-journal-sdk", version = "0.7.2" }
```

The workspace also publishes project-prefixed lower-level packages for
consumers that need direct access to the same internal layers used by the SDK:

- `systemd-journal-sdk-common`
- `systemd-journal-sdk-core`
- `systemd-journal-sdk-registry`
- `systemd-journal-sdk-log-writer`
- `systemd-journal-sdk-index`
- `systemd-journal-sdk-engine`

Current writer scope:

- regular journal files by default and compact journal files with
  `JournalFileOptions::with_compact(true)` or `Config::with_compact(true)`;
- uncompressed DATA objects by default;
- optional zstd, xz, and lz4-compressed DATA object writing through
  `JournalFileOptions` and `journal::Config`, using systemd's 512-byte default
  threshold and 8-byte minimum clamp;
- keyed hash tables using the journal file ID;
- byte-safe field values through `&[u8]` field payloads;
- direct-file writing through `journal_core`;
- high-level directory writing through `journal::Log`;
- systemd-compatible `0640` journal file permissions by default, configurable
  for newly-created files through `JournalFileOptions::with_file_mode()` and
  `Config::with_file_mode()`;
- chain active naming by default, with
  `Config::with_strict_systemd_naming(true)` available for strict systemd
  `<source>.journal` active naming;
- shared field-name policy layers for direct-file and directory writers:
  default `FieldNamePolicy::Journald`, app-facing
  `FieldNamePolicy::JournalApp`, and structure-level `FieldNamePolicy::Raw`;
- entry-count, file-size, and duration rotation;
- tracked journal-file-count, byte-size, and age retention;
- optional pure cross-SDK cooperative lockfile with stale-owner detection when
  callers explicitly acquire `journal_core::file::lock::WriterLock`;
- Forward Secure Sealing TAG writing through `SealOptions`, including stock
  `journalctl --verify --verify-key` coverage for sealed files generated by
  this writer;
- FSS `SealOptions::start_usec` normalization to systemd's verification-key
  epoch boundary, so unaligned source timestamps still produce sealed files that
  stock `journalctl --verify --verify-key` can validate;
- low-level `EntryWriteOptions::seqnum(...)` and
  `EntryWriteOptions::boot_id(...)` exact-regeneration support for preserving
  ENTRY sequence gaps and per-entry boot IDs when rewriting existing journal
  files. Leave them unset for normal auto-incrementing sequence numbers and the
  writer-wide boot ID;
- native systemd writers do not participate in the SDK lock protocol and remain
  an operational exclusion;
- live stock-reader validation for the current writer slice with `journalctl
  --file`, `journalctl --file --follow --no-tail --boot=all`, and libsystemd
  reader APIs, including live sequence-order checks;
- configurable explicit live-reader publication cadence through
  `JournalWriter::set_live_publish_every_entries()` and
  `Config::with_live_publish_every_entries()`, defaulting to systemd-compatible
  publication after every entry.

Deferred scope:

- appending to arbitrary historical or systemd-created journal variants. In
  particular, append-open on historical unkeyed-hash files is unsupported and
  returns a controlled error before entry mutation;
- the imported legacy `jf` `journal_file::JournalWriter` remains available for
  compatibility with that crate's public surface, but it is not the supported
  production writer path. It also returns a controlled unsupported-file error
  for unkeyed append targets instead of panicking. New writer integrations
  should use `journal_core::file::JournalWriter` or the high-level
  `journal::Log` directory writer;
- full systemd object-graph verification parity beyond the current repository
  verification API.

Current reader scope:

- regular and compact journal files;
- `.journal`, `.journal~`, `.journal.zst`, and `.journal~.zst` files;
- zstd-compressed fixture files;
- zstd, lz4, and xz-compressed DATA objects through pure-Rust dependencies;
- directory reading across active and archived files with bounded recursive
  traversal, symlink-cycle protection, and interleaved multi-file ordering,
  including mixed regular/compact, compressed/uncompressed,
  sealed/unsealed, and whole-file `.journal.zst` files in one directory;
- forward/backward iteration, cursors, realtime and monotonic timestamps,
  seqnum metadata, field enumeration, binary field values, repeated field
  values, stateful current-entry data enumeration, unique value enumeration,
  and export/json/text formatting;
- byte-preserving RAW field-name access through `Entry::raw_fields()`,
  `Entry::get_raw()`, and `Entry::get_raw_values()`;
  `Entry.fields` and `Entry.field_values` are UTF-8 string-keyed convenience
  maps and do not synthesize lossy names for non-UTF8 RAW field names;
- export byte output preserves non-UTF8 RAW field names; JSON output, field
  enumeration, unique queries, and `get_data` facade helpers remain UTF-8
  field-name surfaces;
- libsystemd-compatible facade functions for open file/directory/files, close,
  seek head/tail/realtime/cursor, next/previous/skip, match groups,
  current-entry data enumeration, field enumeration, unique value enumeration,
  realtime/monotonic/seqnum/cursor metadata, and boot listing;
- facade cursor seeking follows libsystemd semantics: valid missing cursors are
  accepted as seek locations, while `test_cursor` checks exact current position;
- current-entry facade data enumeration returns borrowed `FIELD=value` bytes
  for the current DATA object, matching libsystemd-style validity until the
  current row is reset or the reader advances; uncompressed DATA is returned
  directly from the mmap-backed journal payload, while compressed DATA is copied
  into row-owned stable storage so later compressed DATA reads cannot invalidate
  earlier pointers from the same row;
- direct facade unique queries return language-native `(field, value)` pairs;
  stateful unique enumeration returns full binary-safe `FIELD=value` payloads;
- `FileReader::visit_unique_values()` and
  `DirectoryReader::visit_unique_values()` stream indexed unique values without
  first materializing the full result set;
- `FileReader::explore()` provides an optimized single-file query surface for
  log-explorer workloads: exact indexed filters, selected facet counters,
  optional histogram, optional FTS, optional returned rows, and query counters.
  It lazily classifies reusable DATA objects by DATA offset during candidate-row
  traversal, groups facets that share the same effective filter set into one
  traversal pass, and expands all fields only for returned rows.
  `ExplorerAnchor::Auto` is the default: forward queries start from the lower
  time bound or file head, while backward queries start from the upper time
  bound or file tail. `ExplorerFieldMode::FirstValue` is the default explorer
  accounting mode: one selected facet/histogram/source field contributes at
  most one value per row, so traversal may stop after all required fields are
  found. `ExplorerFieldMode::AllValues` is available when a caller needs exact
  duplicate-value accounting and accepts the slower full-row scan. `explore()`
  owns the reader position and replaces the reader match state while it runs;
  callers should explicitly seek and reapply any manual matches before
  continuing normal iteration after an explorer query;
- `FileReader::explore_with_strategy()` exposes explicit strategy selection.
  `ExplorerStrategy::Traversal` is the default behavior used by `explore()`.
  `ExplorerStrategy::Index` derives all-values facet and histogram counts from
  FIELD/DATA indexes and DATA entry posting lists. It rejects default
  first-value semantics, FTS, and source-realtime-bounded queries instead of
  returning approximate results. `ExplorerStrategy::Compare` runs traversal and
  index, fails if their logical outputs differ, and returns timing/counter
  diagnostics in `ExplorerResult::comparison`. There is no automatic planner
  because index aggregation is faster only for some query shapes;
- `journal::netdata` provides the Netdata-specific Rust function boundary over
  the explorer. It is the SDK API intended to replace Netdata's generic
  `systemd-journal.plugin` logs function. `NetdataJournalFunction::systemd_journal()`
  runs a `systemd-journal` request JSON against a journal directory and returns
  Netdata-shaped function JSON. This layer owns Netdata request parsing,
  default facets, default display columns, histogram defaults, field
  presentation transforms, row options, and zero-count vocabulary padding for
  filtered requests. The default profile keeps UID/GID values as raw journal
  data and does not resolve host user or group names. The separate
  `NetdataJournalFunction::systemd_journal_plugin_compatible()` constructor
  opts into host user/group name presentation to emulate Netdata's installed
  plugin, with per-query UID/GID display caching so repeated values do not
  repeatedly call host name-service lookups. This layer is intentionally
  separate from the core journal file-format reader. Consumers that need
  Netdata function control can use
  `run_directory_request_json_with_options()` or
  `run_directory_request_bytes_with_options()` with
  `NetdataFunctionRunOptions` to supply a timeout, progress callback,
  cancellation callback, and optional caller-owned `NetdataFunctionState`.
  Progress is reported against the files selected for the query after source
  and time-window preselection, including file-end progress for small or fast
  files. Cancellation is checked before each selected file, during active
  Explorer scans, and after file-end progress callbacks. The optional state hook
  lets Netdata pass registry-provided source type/name metadata and persist
  per-file learned
  journal-vs-source-realtime drift. Without state, the wrapper falls back to
  journal headers and plugin-compatible filename classification for built-in
  `__logs_sources` groups. `NetdataFunctionConfig::source_selector_name` and
  `source_selector_help` customize only the displayed selector label/help
  while preserving the `__logs_sources` wire id. Sampling uses
  plugin-compatible sampled, unsampled, and estimated counters for
  full-analysis sliced requests and is disabled for data-only requests. The
  `query` request member uses Netdata
  `SIMPLE_PATTERN` behavior: ordered `|` terms, leading `!` negative terms,
  escaped separators, substring `*` parts, and case-insensitive matching.
  The SDK Netdata boundary always executes indexed slice semantics. The `slice`
  request member is retained in the
  normalized echo because it is part of the plugin request shape; it does not
  select a slower non-slice fallback path.
  Cancellation and no-change responses use Netdata's compact function error
  envelope; timeout returns a partial table response;
- `src/internal/testcmd/netdata_function_wrapper` is a thin offline test adapter
  over the SDK Netdata boundary. It exposes the same CLI shape as Netdata's
  plugin test path:
  `netdata_function_wrapper --test systemd-journal --dir <journal-dir>
  --timeout <seconds> < <request.json>`. The request JSON is read from stdin
  to avoid privileged file reads in test binaries. The comparison tools under
  `../tests/netdata_function/` compare semantic function output against an
  external `systemd-journal.plugin` binary. The wrapper has diagnostic-only
  `--progress-jsonl`, `--cancel-immediately`, and `--cancel-after-progress`
  switches to validate the SDK run-control API; production consumers should
  call `journal::netdata` directly and wire callbacks to their own function
  framework;
- default reader options use live/windowed mmap with a 32 MiB window. Smaller
  windows are available for constrained environments, but high-cardinality
  indexed queries can become remap-bound with very small windows;
- `--output export` uses systemd's size-prefixed binary field encoding and
  blank-line entry separator;
- JSON output includes realtime and monotonic timestamps, preserves valid UTF-8
  strings, and encodes binary values as arrays of unsigned bytes;
- libsystemd-style match behavior: AND between different fields, OR between
  values for the same field, `SdJournalAddDisjunction()` for `+`, and
  `SdJournalAddConjunction()` for explicit AND groups;
- a file-backed `journalctl` command under `src/cmd/journalctl` with
  `--since`, `--until`, `--boot`, and `--follow` support for repository-backed
  files and directories;
- verification APIs: `journal::verify_file()` for structural verification and
  `journal::verify_file_with_key()` for sealed TAG/HMAC verification;
- a conformance adapter under `src/adapter`.

Platform behavior:

- Linux is the validated reference runtime and keeps mmap-backed hot paths,
  monotonic timestamps, Unix directory sync, and SIGBUS handling.
- FreeBSD and macOS builds use monotonic timestamps and the same pure file
  reader/writer paths. Optional identity and lock helpers are separate from the
  core file-format writer.
- Windows builds use unbiased interrupt time for automatic writer timestamps
  and no-op directory fsync/SIGBUS hooks. Optional identity and lock helpers
  are separate from the core file-format writer.
- Non-Linux build checks are compilation evidence only unless runtime evidence
  from that OS is recorded separately. Files written on non-Linux targets must
  still pass Linux stock `journalctl --verify --file` and repository
  interoperability checks before production compatibility is claimed.

Reader limitations:

- `list_boots` uses file-level boot metadata in this slice;
- full systemd object-graph verification parity is tracked separately;
- daemon-only journalctl operations are not implemented.

Basic directory writer usage:

```rust
use journal::{Config, Log, Origin, RetentionPolicy, RotationPolicy, Source};

let origin = Origin {
    machine_id: None,
    namespace: None,
    source: Source::System,
};
let config = Config::new(
    origin,
    RotationPolicy::default()
        .with_number_of_entries(100000)
        .with_duration_of_journal_file(std::time::Duration::from_secs(3600)),
    RetentionPolicy::default()
        .with_number_of_journal_files(10)
        .with_duration_of_journal_files(std::time::Duration::from_secs(7 * 24 * 3600)),
);
let mut log = Log::new("/var/log/journal-sdk", config)?;

log.write_entry(
    &[
        b"MESSAGE=plugin started".as_slice(),
        b"PRIORITY=6".as_slice(),
        b"SYSLOG_IDENTIFIER=example-plugin".as_slice(),
    ],
    None,
)?;
log.sync()?;
log.close()?;
# Ok::<(), Box<dyn std::error::Error>>(())
```

`Log` stores files below `<directory>/<machine-id>/`. By default the active file
uses the chain filename form
`<source>@<seqnum-id>-<head-seqnum>-<head-realtime>.journal`; call
`Config::with_strict_systemd_naming(true)` to use `<source>.journal` as the
active file.
If strict naming opens a directory with a stale chain-named `ONLINE` active
file, it archives that file before creating `<source>.journal`, so the directory
does not keep parallel active files.
If an existing active file is rejected by the low-level append-open path as
unsupported, `Log` follows journald's reliable-open behavior: it uses readable
header metadata to continue sequence identity where possible, moves the old
active file to a collision-safe `*.journal~` disposed name, and creates a fresh
active file. Direct low-level append-open still returns an unsupported error.
Unset rotation and retention limits are disabled. Retention counts the tracked
active/current file in file-count and committed-byte limits, but deletion only
selects older unprotected files owned by the configured source; the tracked
active/current file is never deleted to satisfy a retention limit. Duration
rotation is checked before append using the incoming entry realtime and the
active file head realtime.
Call `Log::enforce_retention()` to apply age/count/byte retention without
waiting for another append-triggered rotation or close. Call `Log::close()` to
archive the current file and enforce retention; `Drop` only performs best-effort
state persistence.
Retention also runs once when a writer opens or creates the active file:
existing-active reopen and `LogOpenMode::Eager` enforce it during construction,
while lazy archived-only construction defers enforcement until the first append
opens the active file, before the first entry is written.
Use `Config::with_open_mode(LogOpenMode::Eager)` to create/open the active file
during construction, and `Config::with_identity_mode(LogIdentityMode::Strict)`
plus `Origin.machine_id` and `Config::with_boot_id()` to require explicit
identity. `LogIdentityMode::Auto` uses explicit IDs when provided and otherwise
generates SDK-local IDs; it does not read host identity sources.
`Log::configured_directory()`, `Log::journal_directory()`,
`Log::active_path()`, `Log::machine_id()`, `Log::boot_id()`, and
`Log::source()` expose the same directory/identity contract as the other SDKs.
Lifecycle observers receive `Created`, `Rotated`, and `RetainedDeleted` events;
`Log::with_artifact_sizer()` includes per-journal sidecar bytes in retained-size
decisions. `write_entry_with_timestamps()` accepts
`EntryTimestamps::source_realtime_usec` for `_SOURCE_REALTIME_TIMESTAMP`
injection and clamps non-progressing realtime and monotonic overrides forward.
The low-level `JournalWriter::add_entry()` path preserves explicit
caller-provided realtime and monotonic timestamps without clamping or rejecting
them; callers using that raw API are responsible for not producing same-boot
backward monotonic entries unless they are intentionally creating invalid
fixtures. On reopen, `Log` seeds the monotonic clamp floor from a persisted
chain tail only when the tail entry boot ID matches the current writer boot ID.
`Log` is a single-writer object; callers must serialize method calls on one
instance. The journal file contract is one writer per file. Acquire
`journal_core::file::lock::WriterLock` when the caller wants the optional
cooperating-writer lock helper to reject another SDK writer for the same file.
`Config::with_field_name_policy()` selects the high-level writer field-name
layer. The default `FieldNamePolicy::Journald` preserves trusted systemd fields
such as `_HOSTNAME` and `_TRANSPORT`. `FieldNamePolicy::JournalApp` drops caller
fields that journald would reject from untrusted applications and fails only
when no caller fields remain. `FieldNamePolicy::Raw` accepts any non-empty
field name that does not contain `=`, but RAW-mode files are not guaranteed to
be accepted by stock systemd tooling. Producer-specific field transformations
belong outside the SDK.

Journal files are created with systemd journald's `0640` default permissions.
Use `JournalFileOptions::with_file_mode()` for direct-file writers or
`Config::with_file_mode()` for directory writers when a consumer needs another
mode. The override applies only to newly-created files; existing files keep
their current filesystem permissions. POSIX modes remain subject to the
process umask, matching systemd/open semantics. Non-POSIX platforms may ignore
POSIX mode bits.

Live-reader publication can be tuned when the consumer does not need immediate
stock follow-reader wakeups:

```rust
let config = config.with_live_publish_every_entries(64);
```

`1` is the default and publishes after every entry. `0` disables explicit SDK
live publication for poll/snapshot consumers. `N > 1` publishes after every
`N` entries. This is not an `fsync` or durability setting.

Binary-safe values:

```rust
log.write_entry(
    &[
        b"MESSAGE=sample with binary payload".as_slice(),
        b"BINARY_PAYLOAD=\x00\x01\x02\xff".as_slice(),
    ],
    None,
)?;
# Ok::<(), Box<dyn std::error::Error>>(())
```

Basic reader usage:

```rust
use journal::FileReader;

let mut reader = FileReader::open("/path/to/system.journal")?;
reader.add_match(b"PRIORITY=6");
reader.seek_head();

while reader.next()? {
    let entry = reader.get_entry()?;
    if let Some(message) = entry.get_str("MESSAGE") {
        println!("{message}");
    }
}
# Ok::<(), Box<dyn std::error::Error>>(())
```

Optimized single-file explorer usage:

```rust
use journal::{ExplorerQuery, FileReader};

let mut reader = FileReader::open("/path/to/system.journal")?;
let result = reader.explore(&ExplorerQuery {
    facets: vec![b"PRIORITY".to_vec()],
    limit: 0,
    ..ExplorerQuery::default()
})?;

if let Some(priority) = result.facets.get(b"PRIORITY".as_slice()) {
    for (value, count) in priority {
        println!("{} {count}", String::from_utf8_lossy(value));
    }
}
# Ok::<(), Box<dyn std::error::Error>>(())
```

The default first-value mode counts at most one value per selected field per
row. Use `ExplorerFieldMode::AllValues` when a row may contain repeated values
for a selected facet or histogram field and every duplicate value must count.

Explorer column catalogs are built from FIELD indexes. Do not use row traversal
to discover columns in production; a comparison that needs
`debug_collect_column_fields_by_row_traversal` has found a bug in the explorer
or its column-catalog setup, not a valid operating mode.

Specialized callers can select an execution strategy:

```rust
use journal::{ExplorerFieldMode, ExplorerQuery, ExplorerStrategy, FileReader};

let mut reader = FileReader::open("/path/to/system.journal")?;
let result = reader.explore_with_strategy(
    &ExplorerQuery {
        facets: vec![b"PRIORITY".to_vec()],
        field_mode: ExplorerFieldMode::AllValues,
        use_source_realtime: false,
        limit: 0,
        ..ExplorerQuery::default()
    },
    ExplorerStrategy::Index,
)?;
# Ok::<(), Box<dyn std::error::Error>>(())
```

The index strategy is exact for its supported subset, but it is not a universal
speedup. It can be much faster for narrow unfiltered all-values facets and
histograms, and slower for many facets or selective filters. Use
`ExplorerStrategy::Compare` when validating a query shape before relying on the
index strategy; successful compare results include traversal and index timings
and stats in `ExplorerResult::comparison`.

The default `ExplorerAnchor::Auto` chooses the natural scan start for the query
direction. Use explicit `Head`, `Tail`, or `Realtime(usec)` anchors only for
manual paging or when the caller intentionally wants a non-default start point.

For RAW-mode files, use the byte-keyed entry surface when field names are not
guaranteed to be UTF-8:

```rust
if let Some(value) = entry.get_raw(b"\xffRAW") {
    assert_eq!(value, b"raw value");
}

for field in entry.raw_fields() {
    let name_bytes = field.name;
    let value_bytes = field.value;
}
```

File-backed journalctl:

```sh
cargo run --manifest-path rust/Cargo.toml -p journalctl -- \
  --file fixtures/systemd/test-data/no-rtc/system.journal.zst \
  --head 1 \
  --output json
```

Repeated matches for the same field are OR alternatives. Matches for different
fields are ANDed. A separate `+` argument creates an explicit disjunction:

```sh
cargo run --manifest-path rust/Cargo.toml -p journalctl -- \
  --file ./sample.journal \
  PRIORITY=3 PRIORITY=4 + MESSAGE=boot
```

Realtime ranges, boot filters, and follow mode are supported for file-backed
inputs:

```sh
cargo run --manifest-path rust/Cargo.toml -p journalctl -- \
  --directory ./journals --boot=all --since @1700000000 --until @1700003600
cargo run --manifest-path rust/Cargo.toml -p journalctl -- \
  --file ./active.journal --follow --no-tail --boot=all
```