# Formal Verification Roadmap
This roadmap captures the initial strategy for driving the `triblespace` crates
toward comprehensive formal verification. It unifies model checking, symbolic
execution, fuzzing, and deterministic simulation so we can reason about both the
low-level data structures and high-level repository workflows with stronger
correctness guarantees.
## Verification Stack Overview
- **Model checking with Kani** explores bounded but exhaustive state spaces for
invariants that must never be violated.
- **Symbolic execution with Miri** exposes undefined behaviour (UB) and
aliasing issues across regular unit tests without requiring new harnesses.
- **Coverage-guided fuzzing** stresses APIs with randomized input sequences to
uncover emergent behaviours that formal proofs might miss.
- **Deterministic simulations** replay realistic repository workflows so we can
audit higher-level semantics and regression-test subtle interplays between
subsystems.
Each technique complements the others; together they provide layered
assurance that keeps regressions from reaching downstream users.
## Goals
- Protect the fundamental algebraic properties of `TribleSet`, `PATCH`, and the
repository commit graph.
- Exercise serialization, deserialization, and zero-copy data views under
adversarial inputs.
- Detect behavioural regressions in query heuristics, constraint solving, and
workspace merging before they reach downstream users.
- Integrate the tooling into CI so proofs and regression checks run
automatically for every change.
- Preserve a contributor-friendly workflow where verification steps are
discoverable, well documented, and quick to reproduce locally.
## Current Foundation
- `proofs/` already contains Kani harnesses for query, value, and variable-set
behaviour. They provide examples of bounded nondeterministic data generation
(`kani::any`, `Value::new`) and assume/guarantee reasoning that new harnesses
can reuse.
- `./scripts/preflight.sh` is the aggregation point for formatting and tests;
adding verification steps here keeps contributor workflows consistent.
## Invariant Catalogue
The roadmap anchors future work around the following invariants. Each row
tracks the subsystem we care about, the guarantees we want to encode, and a
rough sketch of how to exercise them in Kani, Miri, or fuzzing harnesses.
| `TribleSet` (`src/trible/tribleset.rs`) | Union/intersection/difference maintain canonical ordering across all six PATCH indexes; iterators only yield deduplicated `Trible`s; `insert` never drops an ordering. | Extend the existing `variableset` harnesses with nondeterministic inserts, and add a dedicated `tribleset_harness.rs` validating round-trips across every ordering. |
| `PATCH` & `ByteTable` (`src/patch/*.rs`) | Cuckoo displacement respects `MAX_RETRIES` without losing entries; `Branch::modify_child` grows tables when required and preserves `leaf_count`/`segment_count`; `table_grow` copies every occupant exactly once. | Introduce a `patch_harness.rs` that stress-tests `plan_insert`, `table_insert`, and `Branch::grow`, plus a micro-fuzzer that drives inserts/removals across random table sizes. |
| Value schemas (`src/value/*.rs`) | Schema encoders respect declared byte widths; `TryFromValue` conversions and `ValueSchema::validate` reject truncated buffers; zero-copy views stay aligned. | Reuse `value_harness.rs`, adding per-schema helpers plus a Miri regression suite that loads slices at every alignment. |
| Query engine (`src/query/*.rs`) | Constraint solver never aliases conflicting bindings; planner outputs cover all join permutations referenced by `pattern!`; influence tracking matches selected variables. | Expand `proofs/query_harness.rs` with minimal counterexamples, and fuzz constraint graphs via `cargo fuzz`. |
| Repository & commits (`src/repo/*.rs`, `proofs/commit_harness.rs`) | Branch heads remain append-only; `Workspace::pull` never forgets reachable blobs; selector algebra matches Git semantics. | Add bounded commit DAG generators in `commit_harness.rs` plus deterministic simulation traces covering merges and garbage collection. |
| Storage primitives (`src/blob`, `src/repo`, `src/patch/leaf.rs`) | Blob handles stay reference counted; pile headers remain within reserved capacity; byte slices from archives stay valid for the life of the store. | Combine Miri tests for aliasing with nightly fuzzers that replay repository sync transcripts. |
## Expansion Plan
### Phase 1 – Harden the Existing Kani Coverage
1. Catalogue crate-level invariants and map them to concrete Kani harnesses.
Start with:
- `TribleSet` operations preserving canonical ordering and deduplication.
- Join heuristics in `atreides` ensuring variable bindings never alias
conflicting values.
- Repository merge logic maintaining append-only pile semantics.
2. Extract shared helpers for generating bounded arbitrary data (e.g.
`Vec::bounded_any`) so harnesses remain expressive without exploding the
search space.
3. Adopt a per-module harness layout (`proofs/<module>_harness.rs`) registered
from `proofs/mod.rs` to make maintenance predictable.
4. Configure `scripts/verify.sh` to run targeted `cargo kani --harness <name>`
invocations in parallel, then wire it into CI with caching to keep runtimes
manageable.
### Phase 2 – Symbolic Execution with Miri
1. Enable `cargo miri test` for the default test suite to surface undefined
behaviour (UB) and aliasing bugs that regular tests may miss.
2. Gate flaky or unsupported tests with `cfg(miri)` guards so the suite stays
deterministic under the interpreter.
3. Document the workflow in `scripts/preflight.sh` and optionally expose a
dedicated `scripts/miri.sh` for local runs when developers need deeper
debugging.
### Contributor Workflow
1. Run `./scripts/preflight.sh` before every commit; it aggregates formatting,
testing, and (eventually) targeted verification checks.
2. Use `cargo kani --harness <NAME>` locally when iterating on a new proof.
Start from the harness templates in `proofs/` so generators and assumptions
stay consistent.
3. Execute `cargo miri test` after modifying unsafe code, pointer logic, or
concurrency primitives; it catches UB bugs that normal tests cannot
surface.
4. Kick off fuzz targets with `cargo fuzz run <TARGET>` when touching boundary
code (deserializers, planners, repository sync). Store new corpus inputs in
version control if they expose bugs or tricky behaviours.
5. Record findings, gaps, and future work in `INVENTORY.md` so the roadmap
evolves alongside the implementation effort.
### Phase 3 – Fuzzing and Property Testing
1. Introduce a `cargo fuzz` workspace targeting:
- PATCH encoders/decoders with binary corpus seeds generated from integration
tests.
- Join-order heuristics to explore combinations of constraint graphs and filter
predicates.
- Repository sync workflows by fuzzing sequences of commits, pulls, and
merges.
2. Reuse structured generators from `proptest` where deterministic shrinking is
valuable, and bridge them into fuzz harnesses when possible to keep the state
space constrained.
3. Automate nightly or on-demand fuzz campaigns via CI artifacts, storing any
found counterexamples alongside minimised reproducers.
### Phase 4 – Deterministic Simulation Testing
1. Model repository replication scenarios with deterministic event queues to
explore conflict resolution, garbage collection, and concurrent writers.
2. Encode the simulations as regular unit tests backed by recorded execution
traces so they can double as documentation for expected behaviour.
3. Capture simulation scenarios discovered during fuzzing to prevent
regressions.
### Milestones & Reporting
- Track coverage for each invariant in a shared dashboard (CI summary or
`INVENTORY.md`) so contributors can quickly spot gaps.
- Celebrate major wins—like a new harness landing or a bug found via
verification—in the CHANGELOG to reinforce the value of the effort.
- Review and refresh this roadmap at least once per release cycle to keep the
guidance aligned with the architecture.
## Tooling Integration
- Track verification status in CI badges and documentation so contributors know
which guarantees currently hold.
- Extend `INVENTORY.md` with follow-up work items whenever new invariants or
subsystems are identified.
- Keep verification-specific configuration (Kani property files, fuzz corpora,
deterministic seeds) under version control to make runs reproducible.
## Next Steps
1. Break the invariant catalogue into GitHub issues that reference the planned
harness files (`proofs/tribleset_harness.rs`, `proofs/patch_harness.rs`, etc.).
2. Prototype the PATCH harness that drives `Branch::modify_child` through
insertion/growth cycles so we can assert the displacement planner and
`table_grow` never drop entries; wire the run into `scripts/verify.sh`.
3. Evaluate CI capacity to determine how frequently Kani proofs, `cargo miri`,
and fuzzers can run without blocking contributors, documenting the cadence
directly in `INVENTORY.md`.
This roadmap should evolve alongside the codebase—update it whenever new
verification opportunities or obstacles appear.