global-state-detector 0.1.0

Detect persistent writable global state between fuzzer iterations (Rust bindings for the global-state-detector C library)
Documentation
# global-state-detector-rs

Rust bindings for [`global-state-detector`](https://github.com/AFLplusplus/global-state-detector),
a small C helper that reports persistent writable global state between fuzzer
iterations. This is useful when a fuzz target is supposed to be deterministic
and iteration-local but hidden `.data` / `.bss` state makes later
inputs depend on earlier ones. Surfaces the same kind of instability
that AFL++'s `afl-fuzz` and LibAFL flag, but per-byte and with symbol
attribution.

## What it detects

* Writable, non-executable `PT_LOAD` segments (`.data` / `.bss`) of the
  main binary.
* Writable, non-executable `PT_LOAD` segments of every loaded shared
  object discovered through `dl_iterate_phdr`.
* Page-level changes via a fast hash, followed by byte-range reporting
  for changed pages with `dladdr`-resolved symbol attribution.
* Clang sanitizer coverage counters are ignored when the
  `__sancov_cntrs` linker-provided range is present, so libFuzzer's
  own coverage bitmap does not dominate reports.

## What it does NOT detect

* Heap or `mmap`-backed state (anonymous mappings).
* Thread-local storage (`thread_local!`, `__thread`, glibc TLS).
* External process state — files, sockets, pipes, IPC.
* Writable state in deliberately filtered noisy modules: `libc.so*`,
  `ld-linux*`, `libpthread*`, `libstdc++*`, `linux-vdso.so*`.

## Platform support

Linux ELF processes only. Uses `dl_iterate_phdr`, `dladdr`, and ELF
program headers from `<elf.h>` / `<link.h>`. macOS and Windows are not
supported.

The crate's `build.rs` invokes `cc` with the system C compiler. Use a
clang-based fuzzer toolchain (`afl-clang-fast`, `clang`) when
instrumenting your target. The detector itself only needs a working
C compiler to build.

## Installation

Clone with submodules — the C source ships under `csrc/`:

```sh
git clone --recurse-submodules https://github.com/AFLplusplus/global-state-detector-rs
# or, if you already cloned without submodules:
git submodule update --init --recursive
```

Add the dependency to your fuzz harness's `Cargo.toml`:

```toml
[dependencies]
global-state-detector = { path = "../path/to/global-state-detector-rs" }
```

Or, once published:

```toml
[dependencies]
global-state-detector = "0.1"
```

## Required linker flags

Cargo does **not** propagate `rustc-link-arg` from rlib dependencies to
downstream binaries, so the consuming crate must arrange for the
linker to receive these flags itself. Add a `.cargo/config.toml`
alongside the harness. For cargo-fuzz that is `fuzz/.cargo/config.toml`:

```toml
[target.'cfg(target_os = "linux")']
rustflags = ["-C", "link-arg=-rdynamic", "-C", "link-arg=-Wl,-z,now"]
```

| Flag             | Why it matters                                                                                                  |
| ---------------- | --------------------------------------------------------------------------------------------------------------- |
| `-rdynamic`      | Keeps non-exported symbols in the dynamic symbol table so `dladdr` can resolve them. Without it, reports show `?+0x...`. |
| `-Wl,-z,now`     | Disables lazy PLT/GOT binding. Without it, the first iteration reports massive churn from binding being resolved on demand. |

> **cargo-fuzz users:** `.cargo/config.toml` rustflags do **not**
> survive cargo-fuzz. cargo-fuzz sets its own `RUSTFLAGS` environment
> variable, and env-var rustflags *override* config-file rustflags
> rather than merging. Emit the same flags from a `fuzz/build.rs`
> instead — `cargo:rustc-link-arg-bins` goes through cargo's metadata
> channel and is not affected:
>
> ```rust
> // fuzz/build.rs
> fn main() {
>     println!("cargo:rustc-link-arg-bins=-rdynamic");
>     println!("cargo:rustc-link-arg-bins=-Wl,-z,now");
> }
> ```
>
> See [`fuzz/build.rs`]fuzz/build.rs in this repo for the working version.

## API

```rust
pub fn init();
pub fn check(rebaseline: bool) -> i32;
pub fn rebaseline();
```

* [`init`] — snapshots all writable `PT_LOAD` segments. Call once
  after one-time target initialization is complete.
* [`check`] — diffs current memory against the last snapshot. Returns
  the number of pages that changed. With `rebaseline = true`, updates
  the snapshot so the next call only shows new deltas. Pass `false`
  for cumulative drift across the entire run.
* [`rebaseline`] — re-snapshots without reporting. Use it to refresh
  the baseline immediately before invoking the target.

[`init`]: https://docs.rs/global-state-detector/latest/global_state_detector/fn.init.html
[`check`]: https://docs.rs/global-state-detector/latest/global_state_detector/fn.check.html
[`rebaseline`]: https://docs.rs/global-state-detector/latest/global_state_detector/fn.rebaseline.html

## Recommended harness pattern

`rebaseline` immediately before the target, `check(true)` immediately
after. That window attributes drift to the target rather than to the
fuzzer's own bookkeeping between callbacks.

### AFL++ persistent mode (afl.rs/cargo-afl)

```rust
use afl::fuzz;
use std::sync::Once;

static INIT: Once = Once::new();

fn main() {
    fuzz!(|data: &[u8]| {
        if !INIT.is_completed() {
            INIT.call_once(|| {
                global_state_detector::init();
            });
        } else {
            global_state_detector::rebaseline();
        }

        let _ = my_target::process(data);

        global_state_detector::check(true);
    });
}
```

### cargo-fuzz / libFuzzer

```rust
#![no_main]
use libfuzzer_sys::fuzz_target;
use std::sync::Once;

static INIT: Once = Once::new();

fuzz_target!(|data: &[u8]| {
    if !INIT.is_completed() {
        INIT.call_once(|| {
            // any one-time target init goes here:
            // my_target::init_global_resources();
            global_state_detector::init();
        });
    } else {
        global_state_detector::rebaseline();
    }

    let _ = my_target::process(data);

    global_state_detector::check(/* rebaseline = */ true);
});
```

## Running the bundled example

The repo ships a runnable demo split across two directories on purpose:

* **`example/example.rs`** is the user-shaped template — what you
  would replicate in your own project. It contains the canonical
  harness pattern plus a tiny inline stand-in for the target under
  test (a `static` accumulator that mutates on every call, mirroring
  `csrc/harness_example.c`). In a real harness, replace the inline
  `target_process` with a call into your own crate.
* **`fuzz/`** is the cargo-fuzz scaffold that points at the example so
  it actually runs from this repo. Its `[[bin]]` references
  `../example/example.rs` directly — no copy, no duplication.
  `fuzz/build.rs` supplies the linker flags (see the cargo-fuzz note
  above).

Prerequisites: nightly Rust and cargo-fuzz.

```sh
rustup toolchain install nightly
cargo install cargo-fuzz rustfilt
git submodule update --init --recursive
```

Build and run:

```sh
cargo fuzz build example
cargo fuzz run example -- -runs=10 2>&1 | rustfilt
```

You should see:

```text
[global-state-detector] init: N regions, M bytes, K modules skipped
[global-state-detector] CHANGE 0x... len=...  ACCUMULATOR+0x0  ([main])
               was: ...
               now: ...
```

cargo-fuzz's default AddressSanitizer is fine — the upstream C
library is ASAN-aware (it reads through ASAN red zones safely). If
you previously saw `global-buffer-overflow` from `memcpy` inside
`global_state_detector_check`, update the `csrc/` submodule to pick up
the fix.

### libFuzzer self-state in reports

You will see some changes attributed to `_ZN6fuzzer3TPCE+...` —
libFuzzer's own coverage/program-counter tables. The detector skips
`__sancov_cntrs` but not the rest of libFuzzer's writable globals.
Treat those as fuzzer bookkeeping, not target drift.

## Sample report

```text
[global-state-detector] init: 14 regions, 921600 bytes, 5 modules skipped
[global-state-detector] CHANGE 0x55c1a04b3020 len=8  target_accumulator+0x0  ([main])
               was: 00 00 00 00 00 00 00 00
               now: 7f 00 00 00 00 00 00 00
```

Format:

```
[global-state-detector] CHANGE <addr> len=<bytes>  <symbol>+<offset>  (<module>)
               was: <up to 16 bytes hex>
               now: <up to 16 bytes hex>
```

Up to 32 byte-runs per `check` call are reported; further changes are
counted but not dumped to keep output bounded.

### Demangling

Rust symbols come out mangled (`_ZN8...` or `_R...`). Pipe stderr
through [`rustfilt`](https://crates.io/crates/rustfilt):

```sh
cargo install rustfilt
cargo fuzz run example -- -runs=50 2>&1 | rustfilt
```

## Rust-specific caveats

| Construct                                       | Tracked?                                            |
| ----------------------------------------------- | --------------------------------------------------- |
| `static` / `static mut`                         | Yes — lives in `.data` / `.bss`.                    |
| `AtomicU*`, `AtomicBool`, etc.                  | Yes.                                                |
| `Mutex<T>` (lock word), `RwLock` (lock word)    | Yes.                                                |
| `OnceCell`, `OnceLock`, `LazyLock`, `lazy_static!` | Pointer/discriminant in `.bss` only — heap payload is invisible. You will see "this static was first-used in this iteration" but not what value it took. |
| `thread_local!`                                 | No — TLS is not snapshotted.                        |
| `Box`, `Vec`, `String` held in a `static`       | The header in `.bss` is tracked; heap contents are not. |

## Noise and limitations

* The detector skips a small allowlist of glibc-family modules
  (`libc.so*`, `ld-linux*`, `libpthread*`, `libstdc++*`,
  `linux-vdso.so*`). Other runtime libraries (libgcc, libssl, custom
  allocators, …) may report expected state — filter at your own
  discretion.
* The first invocation of any uninitialized lazy static — including
  the Rust standard library's allocator state, panic infrastructure,
  and thread-local fallbacks — will look like new writable state. Use
  `rebaseline` immediately before the target call to mask it.
* Hard cap of `PROBE_MAX_REGIONS = 512` snapshotted segments.
  Processes with extremely large module counts will hit this limit; a
  warning is printed to stderr and further regions are skipped.
* Hard cap of `PROBE_MAX_REPORTS = 32` reported byte-runs per check.
  The change *count* returned by `check` is exact; the dumped detail
  is truncated.
* The page hash is FNV-1a, not collision-resistant. Adversarial
  collisions are possible but irrelevant for fuzzing instability
  detection.

## Thread safety

The underlying C implementation is **not** thread-safe. Internal state
(snapshot table, region list) is shared and unsynchronized. Use this
crate from a single-threaded harness, or add external synchronization
around `init` / `check` / `rebaseline`. Most fuzzer harnesses are
single-threaded by default; multi-threaded targets are fine as long as
the detector itself is only invoked from one of them.

## How it works

`init` walks every loaded ELF object via `dl_iterate_phdr`, records
every writable non-executable `PT_LOAD` segment, and copies it into a
shadow buffer with a per-page FNV-1a hash. `check` rehashes each page,
and for any mismatch walks the page byte-by-byte to find contiguous
runs of differing bytes, resolves the run's start address with
`dladdr` for symbol attribution, and prints a hex diff. `rebaseline`
just refreshes the shadow without reporting.

The full implementation is ~340 lines of C; see
[`csrc/global_state_detector.c`](csrc/global_state_detector.c).

## License

AGPL-3.0-or-later, matching the upstream C library. See
[`LICENSE`](LICENSE) for the full text.

## Acknowledgements

Upstream C library: [AFLplusplus/global-state-detector](https://github.com/AFLplusplus/global-state-detector)
by Marc "vanHauser" Heuse.