pgrdf 0.3.0 - Docs.rs

# 10 — Roadmap

> **v0.3 LLD is the authoritative shipped contract**
> ([`specs/SPEC.pgRDF.LLD.v0.3.md`](../specs/SPEC.pgRDF.LLD.v0.3.md) §5).
> Phase numbering on this page tracks the v0.3 phase map verbatim:
> Phase 1 done, Phase 2 (Functional SPARQL Coverage) done through
> sub-steps 2.0 / 2.1 / 2.2, Phase 3 (Storage Performance) steps 1-2
> shipped + step 3 phase A shipped, Phase 4 (Inference) shipped,
> Phase 5 (Validation) stub shipped, Phase 6 (CI + Conformance +
> Release) step 1 shipped.
>
> **Forward-look:**
> [`specs/SPEC.pgRDF.LLD.v0.4-FUTURE.md`](../specs/SPEC.pgRDF.LLD.v0.4-FUTURE.md)
> is the canonical scope document for the v0.4 cut (named-graph
> scoping, SPARQL UPDATE, lifecycle UDFs, CONSTRUCT, property paths,
> plus the SPARQL backlog deferred from v0.3 §3). v0.5 + v1.0 forward
> look lives in that doc's §15.

Within each phase, sub-steps track delivery cadence — each one is a
git commit on `main` with both pgrx + regression coverage green.

Status legend:
- ✅ shipped
- 🚧 in progress (sub-step partially delivered)
- ⏳ planned (not yet started)
- ❌ deferred (intentionally out of current scope)

---

## Phase 1 — Core Storage & Build Automation ✅

Outcome: extension registers cleanly in stock `postgres:17.4-bookworm`
and the local build produces a usable `.so` + `.control` + `.sql`.

- ✅ pgrx 0.16 scaffold compiles on PG 14–17. PG 18 support has
      landed upstream in pgrx 0.18.0 (2026-04-17), but adoption is
      deferred to v0.4: 0.18.0 still trips `E0716` in its
      `impl_table_iter` macro on every Rust stable/nightly we tested,
      and its single-pass schema-gen migration (`pgrx_embed` removal,
      `crate-type` change) is a non-trivial breaking edit. See
      `specs/ERRATA.v0.2.md` E-006 (re-checked 2026-05-14).
- ✅ `_pgrdf_dictionary` + `_pgrdf_quads` schema in
      `sql/schema_v0_2_0.sql`, loaded via `extension_sql_file!`.
- ✅ Hexastore SPO/POS/OSP covering indexes
      (`INCLUDE (is_inferred)`).
- ✅ Two-VM build/run split: Colima 200 GB for builds (Linux
      container), podman for the compose stack.
- ✅ BuildKit cache mounts for `cargo` registry + `target/`; builder
      image 7.73 GB → 3.35 GB.
- ✅ `just build-ext` produces the package artifacts in
      `compose/extensions/`.
- ✅ `just compose-up` boots stock postgres:17.4 + `CREATE EXTENSION
      pgrdf` works end-to-end.

**Not shipped at this phase boundary** (carried into later phases):
- ⏳ GitHub Actions matrix green on tag push (workflow stubs exist;
      not yet wired to a real release).
- ⏳ Pre-built tarballs on a GitHub release matching INSTALL §3
      layout — Phase 4.
- ❌ COPY BINARY ingestion (LLD §4.3) — Phase 2.2 substituted
      **batched INSERT via `unnest($1::bigint[], …)`** as a
      stepping-stone delivery. COPY-BINARY tracked as a Phase 2.x
      performance follow-on.

---

## Phase 2 — Functional SPARQL Coverage ✅

Outcome: SPARQL SELECT queries cover the practically-useful surface
end-to-end; ingestion is fast enough to load real-world ontologies.
Phase 2 split into three sub-phases (2.0 storage CRUD, 2.1 Turtle
ingest, 2.2 SPARQL parser/executor) plus an extended-surface
deliverable track inside 2.2 that landed steps 1-12 below.

### Phase 2.0 — Storage CRUD UDFs ✅

- ✅ `pgrdf.put_term(value, term_type)` + `pgrdf.get_term(id)` with
      `IS NOT DISTINCT FROM` dedup over (term_type, lexical_value,
      datatype_iri_id, language_tag).
- ✅ `pgrdf.put_quad(s, p, o, g)` + `pgrdf.count_quads(g)`.
- ✅ `pgrdf.add_graph(g)` — idempotent LIST partition creation, so
      `DROP TABLE _pgrdf_quads_<g>` becomes the constant-time
      whole-graph drop the LLD calls for.

### Phase 2.1 — Turtle ingest ✅

- ✅ `pgrdf.load_turtle(path, graph_id, base_iri)` and
      `pgrdf.parse_turtle(content, graph_id, base_iri)` via
      `oxttl 0.2`.
- ✅ `put_term_full(value, type, datatype_id, lang)` honours the full
      dictionary key with NULL-aware dedup.
- ✅ 24 W3C / Apache Jena / ValueFlows / ConceptKernel v3.7 ontologies
      smoke-load cleanly via `tests/perf/smoke-ontologies.sh`
      (17 134 triples on the 2026-05-13 fetch). `workflow.ttl` held
      out for non-RFC IRI form (ERRATA E-007).

### Phase 2.2 — Dict cache + batched ingest + SPARQL parser/executor ✅

- ✅ **Per-call HashMap dict cache** + buffered multi-row INSERTs
      via `unnest($1::bigint[], $2::bigint[], $3::bigint[])` with
      BATCH_SIZE = 1000. Reduces SPI calls from ~7/triple to roughly
      `distinct_terms + ceil(triples/1000)`.
- ✅ `pgrdf.load_turtle_verbose` / `parse_turtle_verbose` return
      JSONB stats (triples, dict_cache_hits, dict_db_calls,
      quad_batches, elapsed_ms).
- ✅ `pgrdf.sparql_parse(q TEXT) → JSONB` — spargebra-backed AST
      introspection.
- ✅ `pgrdf.sparql(q TEXT) → SETOF JSONB` — BGP → SQL translator.
      Single triple → N-pattern BGPs with shared-variable INNER
      JOINs via first-occurrence anchors.
- ✅ Three doc tracks split: `specs/` (authoritative) +
      `docs/` (engineering plan) + `guide/` (user docs).
- ✅ 4 client integration guides: Python, Rust, Node/TypeScript, Go.

(Phase 3 storage-performance gates are tracked under
[Phase 3 — Storage Performance](#phase-3--storage-performance--steps-1-2-shipped-step-3-phase-a-shipped)
below, not here. Phase 2.2 closes with the SPARQL parser / executor
landing; perf work picks up under its own phase per v0.3 LLD §5.)

### Phase 2.2 (extended) — SPARQL surface deliverables ✅

Sub-track inside Phase 2.2 that extended `pgrdf.sparql` from the
v0.2 LLD's minimal "SELECT … WHERE { BGP }" toward a practically-useful
SPARQL 1.1 surface, in tight slices each shipping with pgrx +
regression coverage. (Phase 3 in the v0.3 LLD is **Storage
Performance** — see the next section. The "extended SPARQL surface"
label that previously hung off this table was pre-v0.3 framing and
has been retired.)

| Step | Surface | Commit | pgrx | regression |
|---|---|---|---|---|
| 1 | FILTER — identity (`=`, `!=`, `sameTerm`), boolean (`&&`, `\|\|`, `!`), term-type (`isIRI`, `isLiteral`, `isBlank`), `BOUND` | `1ebeefc` | 28 | 14 |
| 2 | FILTER — numeric ordering (`<`/`>`/`<=`/`>=`), `REGEX`, `IN`, `STR` passthrough | `51b4d56` | 34 | 15 |
| 3 | Solution modifiers — `DISTINCT`, `REDUCED`, `LIMIT`, `OFFSET`, `ORDER BY ASC/DESC ?var` | `4bc9a87` | 40 | 16 |
| 4 | `OPTIONAL { ?s :p ?o }` → `LEFT JOIN` (with inner FILTER and chained blocks) | `6546d80` | 45 | 17 |
| 5 | `UNION` (n-way, branch-local FILTERs and OPTIONALs) | `56b7bca` | 51 | 18 |
| 6 | `MINUS` → `NOT EXISTS` keyed by shared variables | `59ee1b9` | 56 | 19 |
| 7 | Aggregates — `COUNT(*)`, `COUNT(?v)`, `COUNT(DISTINCT)`, `SUM`, `AVG`, `MIN`, `MAX` + `GROUP BY` | `fd40845` | 63 | 20 |
| 8 | `HAVING` (post-aggregate filter) + `GROUP_CONCAT` + `SAMPLE` | `066ce53` | 67 | 21 |
| 9 | Expression richness — arithmetic (`+`/`-`/`*`/`/`), `STRLEN`, `CONTAINS`/`STRSTARTS`/`STRENDS`, `LANG`/`DATATYPE`/`UCASE`/`LCASE` | `78df3a6` | 73 | 22 |
| 10 | `BIND(expr AS ?v)` for projection (Literal/NamedNode/Variable, STR/LANG/DATATYPE/UCASE/LCASE/STRLEN, arithmetic, CONCAT) | `99069a6` | 76 | 23 |
| 11 | Multi-triple MINUS (sub-pattern with N triples joined inside the NOT EXISTS) | `bc6d0a8` | 77 | 24 |
| 12 | `ASK { … }` query form → single JSONB row `{"_ask": "true"\|"false"}` | `fc67285` | 79 | 25 |

**SPARQL surface declared substantively complete with step 12.** The
backlog below (every item deferred to v0.4 per
[`SPEC.pgRDF.LLD.v0.4-FUTURE.md`](../specs/SPEC.pgRDF.LLD.v0.4-FUTURE.md))
does not block Phase 3 (Storage Performance) of the v0.3 LLD:

- ⏳ `GRAPH { … }` named-graph clause — needs a graph IRI → graph_id
      mapping (schema change). v0.4-FUTURE §3.
- ⏳ Multi-triple OPTIONAL — relax the current single-triple
      restriction via a derived-table refactor inside the LEFT JOIN.
      (Multi-triple MINUS shipped step 11.) v0.4-FUTURE §11.
- ⏳ Arithmetic in FILTER (`?a + ?b > 30`), `BIND` inside FILTER,
      `SUBSTR`, aggregates-over-UNION. v0.4-FUTURE §11.
      (`lang(?v)` / `datatype(?v)` and the `STRLEN` / `CONTAINS` /
      `STRSTARTS` / `STRENDS` surface shipped step 9; `BIND (expr AS ?v)`
      for projection shipped step 10; type-aware `MIN`/`MAX` over
      `xsd:numeric` shipped post-step-12 — translator slice
      `7de9c17`.)
- ⏳ Type-aware ORDER BY (sort numeric literals numerically rather
      than as strings). v0.4-FUTURE §11.
- ⏳ `VALUES (?x ?y) { … }`. v0.4-FUTURE §11.
- ⏳ Property paths beyond simple sequence (`*`, `+`, `?`, `^`,
      alternation). Simple sequence already works because spargebra
      desugars `:a/:b` into a BGP chain. v0.4-FUTURE §7.
- ⏳ `CONSTRUCT`, `DESCRIBE`. (`ASK` shipped step 12.) v0.4-FUTURE §6.

---

## Phase 3 — Storage Performance 🚧 (steps 1-2 shipped, step 3 phase A shipped)

Outcome: shmem-resident dictionary cache + prepared-plan cache +
bulk-ingest primitive — tracks v0.3 LLD §5.1 / §4.1 / §4.2 / §4.3.

Gates:
- ✅ **Step 1 — Shmem dictionary cache (LLD §4.1)** —
      `PgLwLock<[Slot; 16 384]>` cross-backend cache with u128
      fingerprint, commit-deferred publish, generation invalidation.
      Per-call `load_turtle_verbose.shmem_cache_hits` and cumulative
      `pgrdf.stats()` counters; regression `50-shmem-dict-cache.sql`
      asserts 100 % shmem hit rate on the second load of
      `synth-100.ttl`. Edge-cases locked by
      `63-shmem-reset-invalidation.sql` (slice #61) — `shmem_reset()`
      generation bump + slot-mismatch read-as-cold contract.
- ✅ **Step 2 — Prepared-plan cache (LLD §4.2)** — parameterised
      SPARQL SQL + per-backend `OwnedPreparedStatement` cache keyed
      by the SQL string. `pgrdf.stats()` exposes
      `plan_cache_hits / misses / inserts / local_size`. Operator
      hook: `pgrdf.plan_cache_clear()`. Regression
      `51-plan-cache.sql` asserts the hit / miss / parametric-reuse
      arithmetic for three workload shapes; edge-cases locked by
      `64-plan-cache-clear.sql` (slice #60) — returned-count
      semantics, idempotent-at-zero, post-clear size invariant.
- 🚧 **Step 3 — COPY BINARY ingestion (LLD §4.3)** —
      - ✅ **Phase A**: prepared `INSERT … unnest(…)` cached
        per-backend, reused across batches and across loads.
        Saves one parse+plan per batch (~100–500 µs each).
        Verified by `52-bulk-ingest-perf.sql` on synth-10k.ttl.
      - ⏳ **Phase B** (deferred to v0.4 per
        [`SPEC.pgRDF.LLD.v0.4-FUTURE.md §12`](../specs/SPEC.pgRDF.LLD.v0.4-FUTURE.md)):
        the 2× wall-clock target from LLD §4.3 acceptance is not
        met by phase A alone — the per-tuple executor walk
        dominates. Candidate paths: `pg_sys::heap_multi_insert` per
        partition, or `BeginCopyFrom` + binary callback. Both
        FFI-heavy.
- ⏳ W3C SPARQL 1.1 manifest runner wired into CI; coverage target
      `≥ 30 %` pass for the v0.3 Phase 6 step 2 gate (LLD §5.4).
      Hand-authored W3C-shape harness (23 tests, lock-in slice #55)
      stands in until the full TTL-manifest runner lands.

---

## Phase 4 — Inference Engine ✅ (shipped; loader-writeback deferred)

Outcome: materialized OWL 2 RL inference works against real
ontologies; SHACL validation is its own Phase 5. Tracks LLD v0.3
§5.2.

Gates:
- ✅ `pgrdf.materialize(graph_id BIGINT) → JSONB` —
      `src/inference/reasonable.rs` rehydrates base quads via a
      single SPI scan + 3 dict-JOINs, runs `reasonable::Reasoner`
      (OWL 2 RL — see ERRATA E-002), set-diffs against the input,
      and INSERTs the entailed-but-not-asserted triples with
      `is_inferred = TRUE`. Idempotent. Verified by
      `tests/regression/sql/60-materialize-owl-rl.sql`. Round-trip
      to SPARQL locked by `61-materialize-then-sparql.sql`;
      zero-triple edge locked by `62-materialize-empty.sql` (slice
      #62).
- ⏳ Reasoner-coverage fixture (e.g. pizza ontology subset) with a
      golden expected-closure diff. Deferred — current regression
      uses minimal hand-authored TBoxes.
- ⏳ Loader-side writeback via `flush_batch` (depends on Phase 3
      step 3 phase B shipping the bulk-INSERT primitive in v0.4 per
      [`SPEC.pgRDF.LLD.v0.4-FUTURE.md §12`](../specs/SPEC.pgRDF.LLD.v0.4-FUTURE.md)).

---

## Phase 5 — Validation Engine 🚧 (stub)

Outcome: SHACL validation works against real shapes graphs. Tracks
LLD v0.3 §5.3.

Gates:
- 🚧 `pgrdf.validate(data BIGINT, shapes BIGINT) → JSONB` —
      surface SHIPPED (`src/validation/shacl.rs`); body returns
      `{"status": "stub", …}` blocked by ERRATA E-009 (upstream
      `iri_s`/`rdf-12` dep conflict between `shacl_validation` and
      `reasonable`). Verified by `70-validate-stub.sql`.
- ⏳ Real `shacl_validation` integration once either upstream
      catches up (see `docs/05-validation.md` for the unblock
      conditions). Targeted at v0.5 per
      [`SPEC.pgRDF.LLD.v0.4-FUTURE.md §9`](../specs/SPEC.pgRDF.LLD.v0.4-FUTURE.md)
      (gated on ERRATA E-009).
- ⏳ W3C SHACL conformance manifest runner — paired with Phase 6,
      lands with real SHACL output in v0.5.

---

## Phase 6 — CI + Conformance + Release 🚧 (step 1 shipped)

Outcome: pgRDF is consumable by external operators (CloudNativePG,
StackGres) following INSTALL spec methodology. Benchmarked. Tracks
LLD v0.3 §5.4.

**Step 1 — Regression in CI** ✅
- `.github/workflows/ci.yml` `regression` job runs the
  compose-based pg_regress suite on every PR + push to main.
  Pinned to PG 17 today (compose pin per ERRATA E-006).

**Step 2 — W3C conformance** 🚧 (starter shipped, expanded II)
- ✅ `tests/w3c-sparql/` hand-authored harness — **23 tests** across
  three expansion waves (5 starter + 8 expanded + 5 expanded II +
  3 essentials + 2 translator-fix gates), covering BGP, DISTINCT,
  UNION, OPTIONAL, MINUS, FILTER (isIRI/REGEX/IN/numeric),
  aggregates + HAVING, ORDER BY DESC, LIMIT/OFFSET, BIND/CONCAT,
  ASK true/false, STRLEN, LANG, UCASE, BOUND-after-OPTIONAL,
  STR(?iri), inline HAVING-aggregate, type-aware MIN/MAX. Plus
  3 LUBM-shape correctness gates in `tests/perf/lubm-shape/`.
  Bash runner; runs alongside `tests/regression/` in the same CI
  job. Each expected output cites the W3C spec section it exercises.
  Justfile entry points (`just test-w3c`, `just test-lubm`,
  `just test-conformance`) added in slice #55.
- ⏳ Full W3C TTL-manifest runner against `w3c/rdf-tests`. The
  `pgrdf-w3c-sparql` Rust binary placeholder in
  `regression-w3c.yml::sparql11` (gated `if: false`) is the
  destination shape; lands as v0.4.
- ⏳ W3C SHACL manifest runner. Gated on ERRATA E-009 unblocking;
  per [`SPEC.pgRDF.LLD.v0.4-FUTURE.md §9`](../specs/SPEC.pgRDF.LLD.v0.4-FUTURE.md)
  the SHACL pair (real output + manifest runner) targets v0.5.
- ⏳ Coverage targets ratchet per release:
  SPARQL `≥ 30 % → ≥ 70 % → ≥ 95 %`; SHACL `≥ 50 % → ≥ 90 %`.

**Step 3 — Release artifacts** ⏳
- `.github/workflows/release.yml` already builds and packages on
  `v*` tags; fires the first official release once step 2 lands.
  Matrix is `{14,15,16,17} × {amd64, arm64}` = 8 tarballs per cut
  (PG 18 deferred per ERRATA E-006, slice #36 audit).
- LUBM-100 results in `target/perf-report.json` compared against
  Apache Jena TDB and Apache AGE.
- OCI artifact published at `ghcr.io/styk-tv/pgrdf-bundle:<ver>`
  (INSTALL §11 OQ1).
- INSTALL §12 conformance test in CI against a fresh K8s cluster
  (kind or k3s).
- SHA256SUMS is wired in `release.yml` at both per-tarball and
  aggregate levels (slice #28 audit; supersedes the older slice #36
  "not yet wired" note). The detached GPG signature
  `SHA256SUMS.asc` (INSTALL OQ4) is **deferred to v0.4** — no
  `GPG_PRIVATE_KEY` secret or release-signing key is yet provisioned
  for the workflow. v0.3 ships SHA256SUMS-only integrity; the `.asc`
  follow-up requires sourcing a signing key, publishing the public
  half, and wiring the secret. See `docs/09-release.md` "Aggregate
  checksums" for the consumer-side verification recipe.
- License attribution surface (Apache 2.0 / 2026) declared at
  repo root; NOTICE distribution in the release tarball flagged
  as workflow follow-up (slice #36 adjacent finding).
- MSRV declared `rust-version = "1.91"` in `Cargo.toml` (slice
  #49).
- Target gates: W3C SPARQL 1.1 ≥ 95 % pass; SHACL ≥ 90 % pass
  (the SHACL gate moves with ERRATA E-009 resolution; per
  v0.4-FUTURE §9, real SHACL output is a v0.5 ticket).

---

## v0.4 — next milestone (forward-looking)

v0.4 is the next major cut, drafted in
[`SPEC.pgRDF.LLD.v0.4-FUTURE.md`](../specs/SPEC.pgRDF.LLD.v0.4-FUTURE.md).
What follows summarises the five major tracks — the full contract
lives in the spec. Acceptance criteria, schema deltas, and
translator-level wiring are NOT duplicated here; this section is a
navigation aid only.

### Track 1 — Named-graph scoping + IRI mapping
`GRAPH { … }` SPARQL surface plus a new `_pgrdf_graphs` system table
mapping graph IRIs to the existing integer `graph_id` (LIST-partition
key of `_pgrdf_quads`). `GRAPH ?g { … }` projects `?g` as the IRI,
not the integer. See
[v0.4-FUTURE §3](../specs/SPEC.pgRDF.LLD.v0.4-FUTURE.md#3-named-graph-scoping-and-iri-mapping-new).

### Track 2 — SPARQL UPDATE
`INSERT DATA`, `DELETE DATA`, pattern-driven `INSERT/DELETE … WHERE`,
the atomic `DELETE … INSERT … WHERE` modify, plus `WITH <iri>` and
inline `GRAPH <iri> { … }` graph scope. Overloads `pgrdf.sparql(q)`
to dispatch by query form; UPDATE forms return an `_update` JSONB
summary row. See
[v0.4-FUTURE §4](../specs/SPEC.pgRDF.LLD.v0.4-FUTURE.md#4-sparql-update-new).

### Track 3 — Graph-level lifecycle UDFs
`pgrdf.drop_graph`, `clear_graph`, `copy_graph`, `move_graph` as
partition-level primitives over `_pgrdf_quads` — constant-time
`move_graph` via DETACH/ATTACH metadata swap, `TRUNCATE ONLY` for
`clear_graph`. Also wires the corresponding SPARQL UPDATE forms
(`DROP/CLEAR/CREATE/COPY/MOVE/ADD GRAPH`) to these UDFs. See
[v0.4-FUTURE §5](../specs/SPEC.pgRDF.LLD.v0.4-FUTURE.md#5-graph-level-lifecycle-udfs-new).

### Track 4 — CONSTRUCT
`pgrdf.construct(q TEXT) → SETOF JSONB` returning structured
`{subject, predicate, object}`-shaped rows via the existing term
shaper. Sibling UDF rather than overloading `pgrdf.sparql` — callers
signal intent at the SQL boundary. See
[v0.4-FUTURE §6](../specs/SPEC.pgRDF.LLD.v0.4-FUTURE.md#6-construct-deferred-from-v03-now-in-scope).

### Track 5 — Property paths
`*`, `+`, `?`, `^`, with alternation `p1|p2` as a stretch goal.
Translates to recursive Postgres CTEs with a `pgrdf.path_max_depth`
GUC; falls back to direct BGP match when the predicate's closure is
already materialised. See
[v0.4-FUTURE §7](../specs/SPEC.pgRDF.LLD.v0.4-FUTURE.md#7-property-paths-deferred-from-v03-now-in-scope).

### Carried backlog — SPARQL surface gaps from v0.3
Multi-triple `OPTIONAL { BGP }` (LATERAL-style derived-table refactor),
`VALUES` inline tables, `BIND` output usable in later FILTER/BGP,
aggregates over `UNION`, and `DESCRIBE`. Shipped in the same cut
because they share the translator machinery §4 + §6 already require.
See
[v0.4-FUTURE §11](../specs/SPEC.pgRDF.LLD.v0.4-FUTURE.md#11-sparql-surface-backlog-deferred-from-v03-now-in-scope).

### Performance work carried forward from v0.3
Phase 3 step 3 phase B — `heap_multi_insert` / `COPY BINARY` ingest
path — targets v0.4 (the 2× wall-clock target from v0.3 LLD §4.3
acceptance is not met by phase A alone; the per-tuple executor walk
dominates). Postgres custom-scan hooks for specific quad-shape access
patterns are also flagged at v0.4 as the earliest target, may slip to
v0.5 if the refactor cost exceeds the §4 / §6 wins. These do not gate
the surface work in tracks 1-5; they ship in their own slices. See
[v0.4-FUTURE §12](../specs/SPEC.pgRDF.LLD.v0.4-FUTURE.md#12-performance-work-carried-forward-from-v03).

### Conformance runner wiring (v0.4)
The W3C SPARQL 1.1 manifest runner (Phase 6 step 2, gated `if: false`
in v0.3) is wired in v0.4 — it gates the §11 SPARQL backlog
automatically as the deferred forms come online. See
[v0.4-FUTURE §13](../specs/SPEC.pgRDF.LLD.v0.4-FUTURE.md#13-test-policy-continues-v03-6-unchanged-in-spirit).

### Excluded from v0.4 (planned v0.5)
Real SHACL output (ERRATA E-009-gated), the reasoning profile
selector (`pgrdf.materialize(graph_id, profile)` — RDFS vs OWL-RL),
TriG / N-Quads ingest, IRI overloads for the §5 lifecycle UDFs, and
the W3C SHACL manifest runner. See
[v0.4-FUTURE §8](../specs/SPEC.pgRDF.LLD.v0.4-FUTURE.md#8-reasoning-profile-selector-v05--flagged-here-for-planning),
[§9](../specs/SPEC.pgRDF.LLD.v0.4-FUTURE.md#9-shacl-real-integration-v05--gated-on-errata-e-009),
[§10](../specs/SPEC.pgRDF.LLD.v0.4-FUTURE.md#10-trig--n-quads-ingest-v05).

---

## Coverage ratchet — release-by-release targets

Per-release floor for every CI-enforced test layer plus the two
external-standard pass-rate gates (W3C SPARQL 1.1, W3C SHACL) and the
LUBM cross-engine benchmark. Cells anchor to
[`specs/SPEC.pgRDF.LLD.v0.3.md` §6.1](../specs/SPEC.pgRDF.LLD.v0.3.md)
(test-layer matrix),
[`specs/SPEC.pgRDF.LLD.v0.4-FUTURE.md` §13](../specs/SPEC.pgRDF.LLD.v0.4-FUTURE.md#13-test-policy-continues-v03-6-unchanged-in-spirit)
(v0.4 test policy), and
[`docs/08-testing.md`](08-testing.md) (test strategy doc); nothing
here is new contract, only a consolidated view of the targets already
declared in those sources.

| Layer                                 | v0.3 (current) | v0.4 target                                 | v0.5 target                              | v1.0 target                                            |
|---|---|---|---|---|
| pgrx integration (`cargo pgrx test`)  | 93 ✅           | + `heap_multi_insert` tests                 | TBD                                      | TBD                                                    |
| pg_regress golden                     | 39 ✅           | ~60 (§3 + §4 + §5 + §6 + §7 + §11)          | TBD                                      | TBD                                                    |
| W3C-shape SPARQL harness              | 23 ✅           | superseded by TTL-manifest runner outputs   | superseded by TTL-manifest runner        | superseded by TTL-manifest runner                      |
| LUBM-shape correctness harness        | 3 ✅            | superseded by LUBM-1 real benchmark         | superseded by LUBM-10 real benchmark     | superseded by LUBM-100 real benchmark                  |
| W3C SPARQL 1.1 conformance (manifest) | not wired ⏳   | runner wired + ≥ 30 % pass                  | ≥ 70 % pass                              | ≥ 95 % pass                                            |
| W3C SHACL conformance (manifest)      | not wired ⏳ (E-009) | not wired (still E-009)               | ≥ 50 % pass (E-009 cleared, real output) | ≥ 90 % pass                                            |
| LUBM cross-engine benchmark           | scaffold only ⏳ | LUBM-1 smoke                                | LUBM-10 baseline vs Apache Jena TDB / Apache AGE | LUBM-100 vs Apache Jena TDB / Apache AGE       |

**Ratchet enforcement.** Each release's CI must hit at least its
column's targets; once a target is met it becomes a floor and can
never regress (`docs/08-testing.md` "Regression discipline":
"Coverage gates ratchet but never lower."). A green build on `main`
that drops below a previously-met floor is a CI failure. Cells
marked **TBD** have no published target in the LLD or FUTURE specs
yet — they'll get filled in as v0.5 / v1.0 LLDs draft, not
fabricated here.

---

## Out of scope (v0.x)

(Carries forward unchanged from
[`SPEC.pgRDF.LLD.v0.4-FUTURE.md §14`](../specs/SPEC.pgRDF.LLD.v0.4-FUTURE.md).)

- Streaming replication / logical decoding of RDF state.
- Federated SPARQL `SERVICE` — explicitly deferred to v1.0 per
  v0.4-FUTURE §15.
- Full OWL 2 (EL / QL) reasoning — ERRATA E-002.
- Backup/restore for opaque binary state (tracked by future
  `SPEC.pgRDF.BACKUP.v0.x`, INSTALL §11 OQ5).
- `LOAD <url>` in SPARQL UPDATE — callers fetch externally and
  invoke `pgrdf.load_turtle` / `pgrdf.parse_trig` directly
  (v0.4-FUTURE §14).

---

## Test bar over time

A coarse cumulative view; the precise per-commit count is in the
Phase 2.2 (extended) SPARQL-surface step table above.

(Rows labelled `Phase 3 step N` below this table's first block are
pre-v0.3 framing — they correspond to the Phase 2.2 (extended)
SPARQL surface steps 1-12, not to the v0.3 LLD's Phase 3 Storage
Performance. Test counts are unaffected; the labels are kept here
for git-archaeology fidelity.)

(Once v0.4 work begins, new rows land under `v0.4 cut` labels per
the per-track grouping in the "v0.4 — next milestone" section
above; the v0.3 rows below remain frozen as the shipped baseline.)

| Boundary | pgrx integration | pg_regress files | Notes |
|---|---|---|---|
| Phase 1 done | 0 | 0 | smoke + scaffold only |
| Phase 2.0 done | 7 | 3 | dict + quad CRUD |
| Phase 2.1 done | 11 | 7 | + Turtle ingest, regression fixtures |
| Phase 2.2 done | 21 | 13 | + dict cache, batched ingest, SPARQL parser, BGP-to-SQL, N-pattern BGP joins, user guide |
| Phase 2.2 (extended) step 6 | 56 | 19 | + FILTER, modifiers, OPTIONAL, UNION, MINUS |
| Phase 2.2 (extended) step 7 | 63 | 20 | + aggregates (COUNT/SUM/AVG/MIN/MAX + GROUP BY) |
| Phase 2.2 (extended) steps 8–12 | 79 | 25 | + HAVING, GROUP_CONCAT/SAMPLE, expression richness, BIND, multi-triple MINUS, ASK |
| v0.3 Phase 3 step 1 | 86 | 26 | + shmem dict cache (LLD §4.1), `pgrdf.stats()`, perf regression `50-shmem-dict-cache.sql` |
| v0.3 Phase 3 step 2 | 88 | 27 | + prepared-plan cache (LLD §4.2), parameterised SQL, perf regression `51-plan-cache.sql` |
| v0.3 Phase 3 step 3 phase A | 88 | 28 | + bulk-ingest prepared INSERT (LLD §4.3 phase A), `synth-10k.ttl`, perf regression `52-bulk-ingest-perf.sql`. 2× wall-clock target deferred to phase B / v0.4 |
| v0.3 Phase 4 | 91 | 29 | + `pgrdf.materialize` OWL 2 RL inference via `reasonable` 0.4, set-diff isolation, idempotent re-derivation, regression `60-materialize-owl-rl.sql` |
| v0.3 Phase 5 stub | 93 | 30 | + `pgrdf.validate(data, shapes)` JSONB stub. Real `shacl_validation` integration deferred — ERRATA E-009 (upstream iri_s/rdf-12 dep block). Regression `70-validate-stub.sql` |
| v0.3 Phase 6 step 1 | 93 | 30 | + regression suite wired into CI (`.github/workflows/ci.yml` `regression` job); compose builder + runtime on every PR. W3C runners + LUBM benchmarks remain deferred |
| v0.3 Phase 6 step 2 starter | 93 | 30+5 | + W3C-shape SPARQL harness — 5 starter tests in `tests/w3c-sparql/` wired into the CI regression job. Full W3C TTL-manifest runner deferred to v0.4 |
| v0.3 Phase 6 step 2 expanded | 93 | 30+13 | + 8 more W3C-shape tests covering FILTER, COUNT/HAVING, ORDER BY DESC, LIMIT/OFFSET, BIND/CONCAT, ASK true/false |
| v0.3 Phase 6 step 2 expanded II | 93 | 30+18 | + 5 more W3C-shape tests covering REGEX, IN, STRLEN, LANG, UCASE |
| v0.3 translator-gap signals + step 3 scaffold | 93 | 31+18+3 | + 8 negative regression signals (`80-unsupported-shapes.sql`) locking the error-message contract for unsupported SPARQL shapes; + 3 LUBM-shape correctness gates (`tests/perf/lubm-shape/`) against a hand-authored fixture |
| v0.3 +3 W3C essentials + integration | 93 | 32+21+3 | + 3 more W3C-shape tests (BOUND, STR(?iri), numeric FILTER); + `61-materialize-then-sparql.sql` integration test verifying inferred triples flow back through `pgrdf.sparql` |
| v0.3 stats shape contract | 93 | 33+21+3 | + `82-stats-shape.sql` locks the `pgrdf.stats()` JSONB field set, types, and value-range invariants — schema contract for downstream operator tooling |
| v0.3 translator fix — inline HAVING aggregate | 93 | 33+22+3 | `AggregateSpec.synth_aliases` preserves spargebra's intermediate variable name post-Extend rename; HAVING migration + translation consult both `output_var` and aliases. Negative `gap-1` removed; new positive test `22-having-inline-aggregate` covers `HAVING(SUM(?v) > c)` directly |
| v0.3 translator fix — type-aware MIN/MAX | 93 | 33+23+3 | `MIN`/`MAX` emit `COALESCE(MIN(numeric)::text, MIN(lex))` — numeric ordering on `xsd:numeric` literals, lex fallback for strings. New positive test `23-min-max-numeric` over `xsd:integer` |
| v0.3 error-path signals — #66 | 93 | 34+23+3 | + `81-error-paths.sql` opens a sibling track to `80`: locks the stable error-prefix UDFs emit on invalid input. Helper `_check_error` generalises `_check_gap` via `EXECUTE`. First check: `pgrdf.load_turtle()` against a missing path surfaces `load_turtle: failed to open` |
| v0.3 edge-case signals — #62 | 93 | 35+23+3 | + `62-materialize-empty.sql` opens an edge-case correctness track (slices 62 → forward) below the error-path track (66 → 63): `pgrdf.materialize()` on a zero-triple graph stays non-panicking, returns `base_triples = 0` + non-negative inferred-count, and remains idempotent across two calls (run 2's `previous_inferred_dropped` == run 1's `inferred_triples_written`). Axiomatic OWL 2 RL triple count NOT locked — that's upstream `reasonable` internals |
| v0.3 edge-case signals — #61 | 93 | 36+23+3 | + `63-shmem-reset-invalidation.sql` locks `pgrdf.shmem_reset()`'s shmem-cache invalidation contract: after `reset()` bumps the `GENERATION` atomic, re-parsing terms that were cached pre-reset (a) does NOT advance `shmem_hits` (slot-generation mismatch reads as cold) and (b) DOES advance `shmem_inserts` (fresh inserts replace the invalidated entries). Guards against a refactor of `src/storage/shmem_cache.rs::reset()` that forgets the generation bump and leaves stale dict ids visible across a `DROP EXTENSION; CREATE EXTENSION` cycle. Asserts deltas (not absolute counter values) via `\gset`-captured booleans so the expected output survives upstream churn |
| v0.3 edge-case signals — #60 | 93 | 37+23+3 | + `64-plan-cache-clear.sql` locks the returned-count semantics of `pgrdf.plan_cache_clear()`: fresh backend → 0 dropped, after N structurally distinct queries → N dropped (matches `plan_cache_local_size` snapshot taken pre-clear), `plan_cache_local_size` falls to 0 post-clear, second consecutive clear returns 0 (idempotent at zero). Guards against a refactor of `src/query/plan_cache.rs::plan_cache_clear()` that swaps `m.len()` for a constant, hoists the `len()` after `m.clear()` (always returning 0), or accidentally muddles the per-backend count with the cumulative shmem `plan_cache_inserts` counter. Empirical `size_before` on the current pgrx 0.16 / PG 17 build is 4 (1 ingest-side `flush_batch` INSERT plan + 3 SELECT plans), but the test locks the RELATION `drained = size_before AND size_after = 0 AND idempotent_clear = 0 AND size_before > 0` rather than the literal, so an ingest-path refactor that skips the plan cache leaves the test still passing |
| v0.3 edge-case signals — #59 | 93 | 38+23+3 | + `65-parse-turtle-empty.sql` locks the boundary contract of `pgrdf.parse_turtle()` on triple-free input: empty string, whitespace-only (`E'   \n   \t  '`), comment-only (`E'# c1\n# c2\n'`), and bare `@prefix` declaration all return `0` without panicking; `_pgrdf_quads` for the graph stays empty; `_pgrdf_dictionary` stays empty (interning happens INSIDE the per-triple loop body of `src/storage/loader.rs::ingest_turtle_with_stats`, so directives that emit zero triples emit zero dict writes). Orthogonal correct-path companion to the malformed-input case in `81-error-paths.sql` (which panics with the `load_turtle: turtle parse error: …` prefix): this slice locks that an EMPTY parser iterator is NOT a parse error — it returns `0` cleanly. Guards against a refactor that wraps the loop in a "fast-path" panicking on empty input, that seeds a placeholder dict/quad row, or that mishandles the trailing `flush_batch()` of zero-length arrays |
| v0.3 edge-case signals — #58 | 93 | 38+23+3 | + `tests/perf/smoke-ontologies.expected.tsv` locks the per-ontology triple counts emitted by `tests/perf/smoke-ontologies.sh` across the current 24-ontology W3C/Apache-Jena/ValueFlows/ConceptKernel-v3.7 set (workflow.ttl held out per ERRATA E-007); snapshot today is **24 rows / 17,134 triples total**. New `tests/perf/smoke-ontologies.sh --check` mode re-runs the smoke, regenerates a TSV from the live output, and `diff -u`'s it against the lock-file (exit non-zero on any drift). Catches two regression classes invisible to the bare smoke: an ontology that used to parse stops parsing (row disappears) and the parser silently drops/duplicates triples (count moves). Not gated in CI yet — `fixtures/ontologies/*.ttl` is gitignored, so the smoke can only run locally after `fixtures/ontologies.sh`; a follow-on Phase 6 slice wires `--check` once a CI fetch step lands. Default smoke behaviour (no flag → pretty-print, exit 0) unchanged. NOT a pg_regress file — test bar unchanged at 38+23+3 |
| v0.3 edge-case signals — #57 | 93 | 39+23+3 | + `66-parse-sparql-roundtrip.sql` locks the end-to-end round-trip from `pgrdf.parse_turtle` ingest through `pgrdf.sparql` query: every triple the parser saw MUST be observable via the SPARQL executor across all four object-term kinds plus a blank-node subject. Five `bool_and(EXISTS …)` assertions over a single 5-shape Turtle fragment cover (1) IRI object (`foaf:knows`), (2) plain literal (`foaf:name "Alice"`), (3) typed literal (`ex:age "30"^^xsd:integer`), (4) language-tagged literal (`ex:bio "Engineer"@en`), and (5) blank-node subject — keyed by a sibling-property join `?s foaf:name "Anon" . ?s foaf:name ?n` so the parser-allocated bnode id stays out of the assertion. Sibling to `61-materialize-then-sparql.sql` (which locks the materialize→sparql edge); together they pin both ends of the storage layer's visibility contract to the SPARQL surface. Datatype URI and lang-tag echo policy are NOT pinned by this slice (the SPARQL projection emits the lexical only); their storage-side contracts are locked by `21-typed-literals.sql` / `22-lang-tags.sql` |
| v0.3 edge-case signals — #56 | 93 | 39+23+3 | extends `82-stats-shape.sql` in-place (no new pg_regress file — the file is explicitly scoped to "schema shape only" and these three new invariants are schema shape too) with the schema-drift tripwire trio: (a) exact field count — `count(*) FROM jsonb_object_keys(stats()) = 10` pins to the literal current key count emitted by `src/storage/stats.rs::stats()` (`shmem_ready`, `shmem_slots`, `shmem_hits`, `shmem_misses`, `shmem_inserts`, `shmem_evictions`, `plan_cache_hits`, `plan_cache_misses`, `plan_cache_inserts`, `plan_cache_local_size`) so any added field forces a deliberate test update; (b) keys-match-canonical — `array_agg(k ORDER BY k) = ARRAY[…literal 10-element list…]` catches both silent additions (array gets longer) and silent renames (one element swaps); (c) no-null-fields — `bool_and(jsonb_typeof(value) != 'null')` catches a refactor that defaults an uninitialised counter to JSON `null` rather than `0`. Companions the existing "fields-that-SHOULD-be-there are there" block with the orthogonal "fields-that-SHOULDN'T-be-there ARE NOT there" guarantee — together they pin the closed-set shape contract downstream operator tooling (CloudNativePG operators, CI dashboards, telemetry parsers) wires against. Test count unchanged: still 39+23+3 — three new rows in `tests/regression/expected/82-stats-shape.out` |
| v0.3 harness lock-in — #55 | 93 | 39+23+3 | promotes the W3C-shape + LUBM-shape harnesses to first-class Justfile recipes (`just test-w3c`, `just test-lubm`), introduces `just test-conformance` (regression + W3C-shape + LUBM-shape — every compose-based layer) and `just test-everything` (pgrx integration + test-conformance — the broadest sweep), and lands `just smoke-cold` (`compose-down` → `build-ext` → `compose-up` → `CREATE EXTENSION` → test-conformance) as the cold-compose discipline gate. `just test-all` keeps its narrow `test + test-regression` shape for back-compat. `docs/08-testing.md` and `README.md`'s Tests block point at the new entry points. The shift matters because two of the three compose-based harnesses were previously discoverable only by knowing the bash paths — `just --list` showed nothing about them, and `just test-all` silently skipped them. Cold-compose smoke is the verification half: it catches the bug class that passes on a warm compose because some prior `DROP/CREATE` left state behind, and breaks on the next cold boot. Test count unchanged — the new recipes are wrappers, not new tests. Final entry in the 66→1 coverage countdown; the next phase opens the hygiene cycle |
| **v0.3 cut (current)** | **93** | **39 + 23 + 3 = 65** | **Total 158 tests across all five layers** (93 pgrx integration + 39 pg_regress + 23 W3C-shape SPARQL + 3 LUBM-shape). v0.3 LLD §5 phase status: Phase 1 ✅, Phase 2 ✅ (2.0/2.1/2.2 + extended SPARQL surface steps 1-12), Phase 3 🚧 (steps 1-2 ✅, step 3 phase A ✅, phase B → v0.4), Phase 4 ✅, Phase 5 🚧 stub (real impl → v0.5 per v0.4-FUTURE §9), Phase 6 🚧 (step 1 ✅, step 2 starter + expansions + essentials ✅, step 3 ⏳). License attribution (Apache 2.0 / 2026), MSRV (1.91), ERRATA E-006 re-check (2026-05-14), ERRATA E-010 (cargo audit informational). Forward look: [`SPEC.pgRDF.LLD.v0.4-FUTURE.md`](../specs/SPEC.pgRDF.LLD.v0.4-FUTURE.md) is canonical for v0.4 scope |