pgrdf 0.3.0

Rust-native PostgreSQL extension for RDF, SPARQL, SHACL and OWL reasoning
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
# 10 — Roadmap

> **v0.3 LLD is the authoritative shipped contract**
> ([`specs/SPEC.pgRDF.LLD.v0.3.md`](../specs/SPEC.pgRDF.LLD.v0.3.md) §5).
> Phase numbering on this page tracks the v0.3 phase map verbatim:
> Phase 1 done, Phase 2 (Functional SPARQL Coverage) done through
> sub-steps 2.0 / 2.1 / 2.2, Phase 3 (Storage Performance) steps 1-2
> shipped + step 3 phase A shipped, Phase 4 (Inference) shipped,
> Phase 5 (Validation) stub shipped, Phase 6 (CI + Conformance +
> Release) step 1 shipped.
>
> **Forward-look:**
> [`specs/SPEC.pgRDF.LLD.v0.4-FUTURE.md`](../specs/SPEC.pgRDF.LLD.v0.4-FUTURE.md)
> is the canonical scope document for the v0.4 cut (named-graph
> scoping, SPARQL UPDATE, lifecycle UDFs, CONSTRUCT, property paths,
> plus the SPARQL backlog deferred from v0.3 §3). v0.5 + v1.0 forward
> look lives in that doc's §15.

Within each phase, sub-steps track delivery cadence — each one is a
git commit on `main` with both pgrx + regression coverage green.

Status legend:
- ✅ shipped
- 🚧 in progress (sub-step partially delivered)
- ⏳ planned (not yet started)
- ❌ deferred (intentionally out of current scope)

---

## Phase 1 — Core Storage & Build Automation ✅

Outcome: extension registers cleanly in stock `postgres:17.4-bookworm`
and the local build produces a usable `.so` + `.control` + `.sql`.

- ✅ pgrx 0.16 scaffold compiles on PG 14–17. PG 18 support has
      landed upstream in pgrx 0.18.0 (2026-04-17), but adoption is
      deferred to v0.4: 0.18.0 still trips `E0716` in its
      `impl_table_iter` macro on every Rust stable/nightly we tested,
      and its single-pass schema-gen migration (`pgrx_embed` removal,
      `crate-type` change) is a non-trivial breaking edit. See
      `specs/ERRATA.v0.2.md` E-006 (re-checked 2026-05-14).
- ✅ `_pgrdf_dictionary` + `_pgrdf_quads` schema in
      `sql/schema_v0_2_0.sql`, loaded via `extension_sql_file!`.
- ✅ Hexastore SPO/POS/OSP covering indexes
      (`INCLUDE (is_inferred)`).
- ✅ Two-VM build/run split: Colima 200 GB for builds (Linux
      container), podman for the compose stack.
- ✅ BuildKit cache mounts for `cargo` registry + `target/`; builder
      image 7.73 GB → 3.35 GB.
- ✅ `just build-ext` produces the package artifacts in
      `compose/extensions/`.
- ✅ `just compose-up` boots stock postgres:17.4 + `CREATE EXTENSION
      pgrdf` works end-to-end.

**Not shipped at this phase boundary** (carried into later phases):
- ⏳ GitHub Actions matrix green on tag push (workflow stubs exist;
      not yet wired to a real release).
- ⏳ Pre-built tarballs on a GitHub release matching INSTALL §3
      layout — Phase 4.
- ❌ COPY BINARY ingestion (LLD §4.3) — Phase 2.2 substituted
      **batched INSERT via `unnest($1::bigint[], …)`** as a
      stepping-stone delivery. COPY-BINARY tracked as a Phase 2.x
      performance follow-on.

---

## Phase 2 — Functional SPARQL Coverage ✅

Outcome: SPARQL SELECT queries cover the practically-useful surface
end-to-end; ingestion is fast enough to load real-world ontologies.
Phase 2 split into three sub-phases (2.0 storage CRUD, 2.1 Turtle
ingest, 2.2 SPARQL parser/executor) plus an extended-surface
deliverable track inside 2.2 that landed steps 1-12 below.

### Phase 2.0 — Storage CRUD UDFs ✅

- ✅ `pgrdf.put_term(value, term_type)` + `pgrdf.get_term(id)` with
      `IS NOT DISTINCT FROM` dedup over (term_type, lexical_value,
      datatype_iri_id, language_tag).
- ✅ `pgrdf.put_quad(s, p, o, g)` + `pgrdf.count_quads(g)`.
- ✅ `pgrdf.add_graph(g)` — idempotent LIST partition creation, so
      `DROP TABLE _pgrdf_quads_<g>` becomes the constant-time
      whole-graph drop the LLD calls for.

### Phase 2.1 — Turtle ingest ✅

- ✅ `pgrdf.load_turtle(path, graph_id, base_iri)` and
      `pgrdf.parse_turtle(content, graph_id, base_iri)` via
      `oxttl 0.2`.
- ✅ `put_term_full(value, type, datatype_id, lang)` honours the full
      dictionary key with NULL-aware dedup.
- ✅ 24 W3C / Apache Jena / ValueFlows / ConceptKernel v3.7 ontologies
      smoke-load cleanly via `tests/perf/smoke-ontologies.sh`
      (17 134 triples on the 2026-05-13 fetch). `workflow.ttl` held
      out for non-RFC IRI form (ERRATA E-007).

### Phase 2.2 — Dict cache + batched ingest + SPARQL parser/executor ✅

- ✅ **Per-call HashMap dict cache** + buffered multi-row INSERTs
      via `unnest($1::bigint[], $2::bigint[], $3::bigint[])` with
      BATCH_SIZE = 1000. Reduces SPI calls from ~7/triple to roughly
      `distinct_terms + ceil(triples/1000)`.
- ✅ `pgrdf.load_turtle_verbose` / `parse_turtle_verbose` return
      JSONB stats (triples, dict_cache_hits, dict_db_calls,
      quad_batches, elapsed_ms).
- ✅ `pgrdf.sparql_parse(q TEXT) → JSONB` — spargebra-backed AST
      introspection.
- ✅ `pgrdf.sparql(q TEXT) → SETOF JSONB` — BGP → SQL translator.
      Single triple → N-pattern BGPs with shared-variable INNER
      JOINs via first-occurrence anchors.
- ✅ Three doc tracks split: `specs/` (authoritative) +
      `docs/` (engineering plan) + `guide/` (user docs).
- ✅ 4 client integration guides: Python, Rust, Node/TypeScript, Go.

(Phase 3 storage-performance gates are tracked under
[Phase 3 — Storage Performance](#phase-3--storage-performance--steps-1-2-shipped-step-3-phase-a-shipped)
below, not here. Phase 2.2 closes with the SPARQL parser / executor
landing; perf work picks up under its own phase per v0.3 LLD §5.)

### Phase 2.2 (extended) — SPARQL surface deliverables ✅

Sub-track inside Phase 2.2 that extended `pgrdf.sparql` from the
v0.2 LLD's minimal "SELECT … WHERE { BGP }" toward a practically-useful
SPARQL 1.1 surface, in tight slices each shipping with pgrx +
regression coverage. (Phase 3 in the v0.3 LLD is **Storage
Performance** — see the next section. The "extended SPARQL surface"
label that previously hung off this table was pre-v0.3 framing and
has been retired.)

| Step | Surface | Commit | pgrx | regression |
|---|---|---|---|---|
| 1 | FILTER — identity (`=`, `!=`, `sameTerm`), boolean (`&&`, `\|\|`, `!`), term-type (`isIRI`, `isLiteral`, `isBlank`), `BOUND` | `1ebeefc` | 28 | 14 |
| 2 | FILTER — numeric ordering (`<`/`>`/`<=`/`>=`), `REGEX`, `IN`, `STR` passthrough | `51b4d56` | 34 | 15 |
| 3 | Solution modifiers — `DISTINCT`, `REDUCED`, `LIMIT`, `OFFSET`, `ORDER BY ASC/DESC ?var` | `4bc9a87` | 40 | 16 |
| 4 | `OPTIONAL { ?s :p ?o }` → `LEFT JOIN` (with inner FILTER and chained blocks) | `6546d80` | 45 | 17 |
| 5 | `UNION` (n-way, branch-local FILTERs and OPTIONALs) | `56b7bca` | 51 | 18 |
| 6 | `MINUS` → `NOT EXISTS` keyed by shared variables | `59ee1b9` | 56 | 19 |
| 7 | Aggregates — `COUNT(*)`, `COUNT(?v)`, `COUNT(DISTINCT)`, `SUM`, `AVG`, `MIN`, `MAX` + `GROUP BY` | `fd40845` | 63 | 20 |
| 8 | `HAVING` (post-aggregate filter) + `GROUP_CONCAT` + `SAMPLE` | `066ce53` | 67 | 21 |
| 9 | Expression richness — arithmetic (`+`/`-`/`*`/`/`), `STRLEN`, `CONTAINS`/`STRSTARTS`/`STRENDS`, `LANG`/`DATATYPE`/`UCASE`/`LCASE` | `78df3a6` | 73 | 22 |
| 10 | `BIND(expr AS ?v)` for projection (Literal/NamedNode/Variable, STR/LANG/DATATYPE/UCASE/LCASE/STRLEN, arithmetic, CONCAT) | `99069a6` | 76 | 23 |
| 11 | Multi-triple MINUS (sub-pattern with N triples joined inside the NOT EXISTS) | `bc6d0a8` | 77 | 24 |
| 12 | `ASK { … }` query form → single JSONB row `{"_ask": "true"\|"false"}` | `fc67285` | 79 | 25 |

**SPARQL surface declared substantively complete with step 12.** The
backlog below (every item deferred to v0.4 per
[`SPEC.pgRDF.LLD.v0.4-FUTURE.md`](../specs/SPEC.pgRDF.LLD.v0.4-FUTURE.md))
does not block Phase 3 (Storage Performance) of the v0.3 LLD:

- ⏳ `GRAPH { … }` named-graph clause — needs a graph IRI → graph_id
      mapping (schema change). v0.4-FUTURE §3.
- ⏳ Multi-triple OPTIONAL — relax the current single-triple
      restriction via a derived-table refactor inside the LEFT JOIN.
      (Multi-triple MINUS shipped step 11.) v0.4-FUTURE §11.
- ⏳ Arithmetic in FILTER (`?a + ?b > 30`), `BIND` inside FILTER,
      `SUBSTR`, aggregates-over-UNION. v0.4-FUTURE §11.
      (`lang(?v)` / `datatype(?v)` and the `STRLEN` / `CONTAINS` /
      `STRSTARTS` / `STRENDS` surface shipped step 9; `BIND (expr AS ?v)`
      for projection shipped step 10; type-aware `MIN`/`MAX` over
      `xsd:numeric` shipped post-step-12 — translator slice
      `7de9c17`.)
- ⏳ Type-aware ORDER BY (sort numeric literals numerically rather
      than as strings). v0.4-FUTURE §11.
- ⏳ `VALUES (?x ?y) { … }`. v0.4-FUTURE §11.
- ⏳ Property paths beyond simple sequence (`*`, `+`, `?`, `^`,
      alternation). Simple sequence already works because spargebra
      desugars `:a/:b` into a BGP chain. v0.4-FUTURE §7.
- ⏳ `CONSTRUCT`, `DESCRIBE`. (`ASK` shipped step 12.) v0.4-FUTURE §6.

---

## Phase 3 — Storage Performance 🚧 (steps 1-2 shipped, step 3 phase A shipped)

Outcome: shmem-resident dictionary cache + prepared-plan cache +
bulk-ingest primitive — tracks v0.3 LLD §5.1 / §4.1 / §4.2 / §4.3.

Gates:
- ✅ **Step 1 — Shmem dictionary cache (LLD §4.1)** —
      `PgLwLock<[Slot; 16 384]>` cross-backend cache with u128
      fingerprint, commit-deferred publish, generation invalidation.
      Per-call `load_turtle_verbose.shmem_cache_hits` and cumulative
      `pgrdf.stats()` counters; regression `50-shmem-dict-cache.sql`
      asserts 100 % shmem hit rate on the second load of
      `synth-100.ttl`. Edge-cases locked by
      `63-shmem-reset-invalidation.sql` (slice #61) — `shmem_reset()`
      generation bump + slot-mismatch read-as-cold contract.
- ✅ **Step 2 — Prepared-plan cache (LLD §4.2)** — parameterised
      SPARQL SQL + per-backend `OwnedPreparedStatement` cache keyed
      by the SQL string. `pgrdf.stats()` exposes
      `plan_cache_hits / misses / inserts / local_size`. Operator
      hook: `pgrdf.plan_cache_clear()`. Regression
      `51-plan-cache.sql` asserts the hit / miss / parametric-reuse
      arithmetic for three workload shapes; edge-cases locked by
      `64-plan-cache-clear.sql` (slice #60) — returned-count
      semantics, idempotent-at-zero, post-clear size invariant.
- 🚧 **Step 3 — COPY BINARY ingestion (LLD §4.3)** —
      - ✅ **Phase A**: prepared `INSERT … unnest(…)` cached
        per-backend, reused across batches and across loads.
        Saves one parse+plan per batch (~100–500 µs each).
        Verified by `52-bulk-ingest-perf.sql` on synth-10k.ttl.
      - ⏳ **Phase B** (deferred to v0.4 per
        [`SPEC.pgRDF.LLD.v0.4-FUTURE.md §12`](../specs/SPEC.pgRDF.LLD.v0.4-FUTURE.md)):
        the 2× wall-clock target from LLD §4.3 acceptance is not
        met by phase A alone — the per-tuple executor walk
        dominates. Candidate paths: `pg_sys::heap_multi_insert` per
        partition, or `BeginCopyFrom` + binary callback. Both
        FFI-heavy.
- ⏳ W3C SPARQL 1.1 manifest runner wired into CI; coverage target
      `≥ 30 %` pass for the v0.3 Phase 6 step 2 gate (LLD §5.4).
      Hand-authored W3C-shape harness (23 tests, lock-in slice #55)
      stands in until the full TTL-manifest runner lands.

---

## Phase 4 — Inference Engine ✅ (shipped; loader-writeback deferred)

Outcome: materialized OWL 2 RL inference works against real
ontologies; SHACL validation is its own Phase 5. Tracks LLD v0.3
§5.2.

Gates:
- ✅ `pgrdf.materialize(graph_id BIGINT) → JSONB` —
      `src/inference/reasonable.rs` rehydrates base quads via a
      single SPI scan + 3 dict-JOINs, runs `reasonable::Reasoner`
      (OWL 2 RL — see ERRATA E-002), set-diffs against the input,
      and INSERTs the entailed-but-not-asserted triples with
      `is_inferred = TRUE`. Idempotent. Verified by
      `tests/regression/sql/60-materialize-owl-rl.sql`. Round-trip
      to SPARQL locked by `61-materialize-then-sparql.sql`;
      zero-triple edge locked by `62-materialize-empty.sql` (slice
      #62).
- ⏳ Reasoner-coverage fixture (e.g. pizza ontology subset) with a
      golden expected-closure diff. Deferred — current regression
      uses minimal hand-authored TBoxes.
- ⏳ Loader-side writeback via `flush_batch` (depends on Phase 3
      step 3 phase B shipping the bulk-INSERT primitive in v0.4 per
      [`SPEC.pgRDF.LLD.v0.4-FUTURE.md §12`](../specs/SPEC.pgRDF.LLD.v0.4-FUTURE.md)).

---

## Phase 5 — Validation Engine 🚧 (stub)

Outcome: SHACL validation works against real shapes graphs. Tracks
LLD v0.3 §5.3.

Gates:
- 🚧 `pgrdf.validate(data BIGINT, shapes BIGINT) → JSONB` —
      surface SHIPPED (`src/validation/shacl.rs`); body returns
      `{"status": "stub", …}` blocked by ERRATA E-009 (upstream
      `iri_s`/`rdf-12` dep conflict between `shacl_validation` and
      `reasonable`). Verified by `70-validate-stub.sql`.
- ⏳ Real `shacl_validation` integration once either upstream
      catches up (see `docs/05-validation.md` for the unblock
      conditions). Targeted at v0.5 per
      [`SPEC.pgRDF.LLD.v0.4-FUTURE.md §9`](../specs/SPEC.pgRDF.LLD.v0.4-FUTURE.md)
      (gated on ERRATA E-009).
- ⏳ W3C SHACL conformance manifest runner — paired with Phase 6,
      lands with real SHACL output in v0.5.

---

## Phase 6 — CI + Conformance + Release 🚧 (step 1 shipped)

Outcome: pgRDF is consumable by external operators (CloudNativePG,
StackGres) following INSTALL spec methodology. Benchmarked. Tracks
LLD v0.3 §5.4.

**Step 1 — Regression in CI** ✅
- `.github/workflows/ci.yml` `regression` job runs the
  compose-based pg_regress suite on every PR + push to main.
  Pinned to PG 17 today (compose pin per ERRATA E-006).

**Step 2 — W3C conformance** 🚧 (starter shipped, expanded II)
- ✅ `tests/w3c-sparql/` hand-authored harness — **23 tests** across
  three expansion waves (5 starter + 8 expanded + 5 expanded II +
  3 essentials + 2 translator-fix gates), covering BGP, DISTINCT,
  UNION, OPTIONAL, MINUS, FILTER (isIRI/REGEX/IN/numeric),
  aggregates + HAVING, ORDER BY DESC, LIMIT/OFFSET, BIND/CONCAT,
  ASK true/false, STRLEN, LANG, UCASE, BOUND-after-OPTIONAL,
  STR(?iri), inline HAVING-aggregate, type-aware MIN/MAX. Plus
  3 LUBM-shape correctness gates in `tests/perf/lubm-shape/`.
  Bash runner; runs alongside `tests/regression/` in the same CI
  job. Each expected output cites the W3C spec section it exercises.
  Justfile entry points (`just test-w3c`, `just test-lubm`,
  `just test-conformance`) added in slice #55.
- ⏳ Full W3C TTL-manifest runner against `w3c/rdf-tests`. The
  `pgrdf-w3c-sparql` Rust binary placeholder in
  `regression-w3c.yml::sparql11` (gated `if: false`) is the
  destination shape; lands as v0.4.
- ⏳ W3C SHACL manifest runner. Gated on ERRATA E-009 unblocking;
  per [`SPEC.pgRDF.LLD.v0.4-FUTURE.md §9`](../specs/SPEC.pgRDF.LLD.v0.4-FUTURE.md)
  the SHACL pair (real output + manifest runner) targets v0.5.
- ⏳ Coverage targets ratchet per release:
  SPARQL `≥ 30 % → ≥ 70 % → ≥ 95 %`; SHACL `≥ 50 % → ≥ 90 %`.

**Step 3 — Release artifacts** ⏳
- `.github/workflows/release.yml` already builds and packages on
  `v*` tags; fires the first official release once step 2 lands.
  Matrix is `{14,15,16,17} × {amd64, arm64}` = 8 tarballs per cut
  (PG 18 deferred per ERRATA E-006, slice #36 audit).
- LUBM-100 results in `target/perf-report.json` compared against
  Apache Jena TDB and Apache AGE.
- OCI artifact published at `ghcr.io/styk-tv/pgrdf-bundle:<ver>`
  (INSTALL §11 OQ1).
- INSTALL §12 conformance test in CI against a fresh K8s cluster
  (kind or k3s).
- SHA256SUMS is wired in `release.yml` at both per-tarball and
  aggregate levels (slice #28 audit; supersedes the older slice #36
  "not yet wired" note). The detached GPG signature
  `SHA256SUMS.asc` (INSTALL OQ4) is **deferred to v0.4** — no
  `GPG_PRIVATE_KEY` secret or release-signing key is yet provisioned
  for the workflow. v0.3 ships SHA256SUMS-only integrity; the `.asc`
  follow-up requires sourcing a signing key, publishing the public
  half, and wiring the secret. See `docs/09-release.md` "Aggregate
  checksums" for the consumer-side verification recipe.
- License attribution surface (Apache 2.0 / 2026) declared at
  repo root; NOTICE distribution in the release tarball flagged
  as workflow follow-up (slice #36 adjacent finding).
- MSRV declared `rust-version = "1.91"` in `Cargo.toml` (slice
  #49).
- Target gates: W3C SPARQL 1.1 ≥ 95 % pass; SHACL ≥ 90 % pass
  (the SHACL gate moves with ERRATA E-009 resolution; per
  v0.4-FUTURE §9, real SHACL output is a v0.5 ticket).

---

## v0.4 — next milestone (forward-looking)

v0.4 is the next major cut, drafted in
[`SPEC.pgRDF.LLD.v0.4-FUTURE.md`](../specs/SPEC.pgRDF.LLD.v0.4-FUTURE.md).
What follows summarises the five major tracks — the full contract
lives in the spec. Acceptance criteria, schema deltas, and
translator-level wiring are NOT duplicated here; this section is a
navigation aid only.

### Track 1 — Named-graph scoping + IRI mapping
`GRAPH { … }` SPARQL surface plus a new `_pgrdf_graphs` system table
mapping graph IRIs to the existing integer `graph_id` (LIST-partition
key of `_pgrdf_quads`). `GRAPH ?g { … }` projects `?g` as the IRI,
not the integer. See
[v0.4-FUTURE §3](../specs/SPEC.pgRDF.LLD.v0.4-FUTURE.md#3-named-graph-scoping-and-iri-mapping-new).

### Track 2 — SPARQL UPDATE
`INSERT DATA`, `DELETE DATA`, pattern-driven `INSERT/DELETE … WHERE`,
the atomic `DELETE … INSERT … WHERE` modify, plus `WITH <iri>` and
inline `GRAPH <iri> { … }` graph scope. Overloads `pgrdf.sparql(q)`
to dispatch by query form; UPDATE forms return an `_update` JSONB
summary row. See
[v0.4-FUTURE §4](../specs/SPEC.pgRDF.LLD.v0.4-FUTURE.md#4-sparql-update-new).

### Track 3 — Graph-level lifecycle UDFs
`pgrdf.drop_graph`, `clear_graph`, `copy_graph`, `move_graph` as
partition-level primitives over `_pgrdf_quads` — constant-time
`move_graph` via DETACH/ATTACH metadata swap, `TRUNCATE ONLY` for
`clear_graph`. Also wires the corresponding SPARQL UPDATE forms
(`DROP/CLEAR/CREATE/COPY/MOVE/ADD GRAPH`) to these UDFs. See
[v0.4-FUTURE §5](../specs/SPEC.pgRDF.LLD.v0.4-FUTURE.md#5-graph-level-lifecycle-udfs-new).

### Track 4 — CONSTRUCT
`pgrdf.construct(q TEXT) → SETOF JSONB` returning structured
`{subject, predicate, object}`-shaped rows via the existing term
shaper. Sibling UDF rather than overloading `pgrdf.sparql` — callers
signal intent at the SQL boundary. See
[v0.4-FUTURE §6](../specs/SPEC.pgRDF.LLD.v0.4-FUTURE.md#6-construct-deferred-from-v03-now-in-scope).

### Track 5 — Property paths
`*`, `+`, `?`, `^`, with alternation `p1|p2` as a stretch goal.
Translates to recursive Postgres CTEs with a `pgrdf.path_max_depth`
GUC; falls back to direct BGP match when the predicate's closure is
already materialised. See
[v0.4-FUTURE §7](../specs/SPEC.pgRDF.LLD.v0.4-FUTURE.md#7-property-paths-deferred-from-v03-now-in-scope).

### Carried backlog — SPARQL surface gaps from v0.3
Multi-triple `OPTIONAL { BGP }` (LATERAL-style derived-table refactor),
`VALUES` inline tables, `BIND` output usable in later FILTER/BGP,
aggregates over `UNION`, and `DESCRIBE`. Shipped in the same cut
because they share the translator machinery §4 + §6 already require.
See
[v0.4-FUTURE §11](../specs/SPEC.pgRDF.LLD.v0.4-FUTURE.md#11-sparql-surface-backlog-deferred-from-v03-now-in-scope).

### Performance work carried forward from v0.3
Phase 3 step 3 phase B — `heap_multi_insert` / `COPY BINARY` ingest
path — targets v0.4 (the 2× wall-clock target from v0.3 LLD §4.3
acceptance is not met by phase A alone; the per-tuple executor walk
dominates). Postgres custom-scan hooks for specific quad-shape access
patterns are also flagged at v0.4 as the earliest target, may slip to
v0.5 if the refactor cost exceeds the §4 / §6 wins. These do not gate
the surface work in tracks 1-5; they ship in their own slices. See
[v0.4-FUTURE §12](../specs/SPEC.pgRDF.LLD.v0.4-FUTURE.md#12-performance-work-carried-forward-from-v03).

### Conformance runner wiring (v0.4)
The W3C SPARQL 1.1 manifest runner (Phase 6 step 2, gated `if: false`
in v0.3) is wired in v0.4 — it gates the §11 SPARQL backlog
automatically as the deferred forms come online. See
[v0.4-FUTURE §13](../specs/SPEC.pgRDF.LLD.v0.4-FUTURE.md#13-test-policy-continues-v03-6-unchanged-in-spirit).

### Excluded from v0.4 (planned v0.5)
Real SHACL output (ERRATA E-009-gated), the reasoning profile
selector (`pgrdf.materialize(graph_id, profile)` — RDFS vs OWL-RL),
TriG / N-Quads ingest, IRI overloads for the §5 lifecycle UDFs, and
the W3C SHACL manifest runner. See
[v0.4-FUTURE §8](../specs/SPEC.pgRDF.LLD.v0.4-FUTURE.md#8-reasoning-profile-selector-v05--flagged-here-for-planning),
[§9](../specs/SPEC.pgRDF.LLD.v0.4-FUTURE.md#9-shacl-real-integration-v05--gated-on-errata-e-009),
[§10](../specs/SPEC.pgRDF.LLD.v0.4-FUTURE.md#10-trig--n-quads-ingest-v05).

---

## Coverage ratchet — release-by-release targets

Per-release floor for every CI-enforced test layer plus the two
external-standard pass-rate gates (W3C SPARQL 1.1, W3C SHACL) and the
LUBM cross-engine benchmark. Cells anchor to
[`specs/SPEC.pgRDF.LLD.v0.3.md` §6.1](../specs/SPEC.pgRDF.LLD.v0.3.md)
(test-layer matrix),
[`specs/SPEC.pgRDF.LLD.v0.4-FUTURE.md` §13](../specs/SPEC.pgRDF.LLD.v0.4-FUTURE.md#13-test-policy-continues-v03-6-unchanged-in-spirit)
(v0.4 test policy), and
[`docs/08-testing.md`](08-testing.md) (test strategy doc); nothing
here is new contract, only a consolidated view of the targets already
declared in those sources.

| Layer                                 | v0.3 (current) | v0.4 target                                 | v0.5 target                              | v1.0 target                                            |
|---|---|---|---|---|
| pgrx integration (`cargo pgrx test`)  | 93 ✅           | + `heap_multi_insert` tests                 | TBD                                      | TBD                                                    |
| pg_regress golden                     | 39 ✅           | ~60 (§3 + §4 + §5 + §6 + §7 + §11)          | TBD                                      | TBD                                                    |
| W3C-shape SPARQL harness              | 23 ✅           | superseded by TTL-manifest runner outputs   | superseded by TTL-manifest runner        | superseded by TTL-manifest runner                      |
| LUBM-shape correctness harness        | 3 ✅            | superseded by LUBM-1 real benchmark         | superseded by LUBM-10 real benchmark     | superseded by LUBM-100 real benchmark                  |
| W3C SPARQL 1.1 conformance (manifest) | not wired ⏳   | runner wired + ≥ 30 % pass                  | ≥ 70 % pass                              | ≥ 95 % pass                                            |
| W3C SHACL conformance (manifest)      | not wired ⏳ (E-009) | not wired (still E-009)               | ≥ 50 % pass (E-009 cleared, real output) | ≥ 90 % pass                                            |
| LUBM cross-engine benchmark           | scaffold only ⏳ | LUBM-1 smoke                                | LUBM-10 baseline vs Apache Jena TDB / Apache AGE | LUBM-100 vs Apache Jena TDB / Apache AGE       |

**Ratchet enforcement.** Each release's CI must hit at least its
column's targets; once a target is met it becomes a floor and can
never regress (`docs/08-testing.md` "Regression discipline":
"Coverage gates ratchet but never lower."). A green build on `main`
that drops below a previously-met floor is a CI failure. Cells
marked **TBD** have no published target in the LLD or FUTURE specs
yet — they'll get filled in as v0.5 / v1.0 LLDs draft, not
fabricated here.

---

## Out of scope (v0.x)

(Carries forward unchanged from
[`SPEC.pgRDF.LLD.v0.4-FUTURE.md §14`](../specs/SPEC.pgRDF.LLD.v0.4-FUTURE.md).)

- Streaming replication / logical decoding of RDF state.
- Federated SPARQL `SERVICE` — explicitly deferred to v1.0 per
  v0.4-FUTURE §15.
- Full OWL 2 (EL / QL) reasoning — ERRATA E-002.
- Backup/restore for opaque binary state (tracked by future
  `SPEC.pgRDF.BACKUP.v0.x`, INSTALL §11 OQ5).
- `LOAD <url>` in SPARQL UPDATE — callers fetch externally and
  invoke `pgrdf.load_turtle` / `pgrdf.parse_trig` directly
  (v0.4-FUTURE §14).

---

## Test bar over time

A coarse cumulative view; the precise per-commit count is in the
Phase 2.2 (extended) SPARQL-surface step table above.

(Rows labelled `Phase 3 step N` below this table's first block are
pre-v0.3 framing — they correspond to the Phase 2.2 (extended)
SPARQL surface steps 1-12, not to the v0.3 LLD's Phase 3 Storage
Performance. Test counts are unaffected; the labels are kept here
for git-archaeology fidelity.)

(Once v0.4 work begins, new rows land under `v0.4 cut` labels per
the per-track grouping in the "v0.4 — next milestone" section
above; the v0.3 rows below remain frozen as the shipped baseline.)

| Boundary | pgrx integration | pg_regress files | Notes |
|---|---|---|---|
| Phase 1 done | 0 | 0 | smoke + scaffold only |
| Phase 2.0 done | 7 | 3 | dict + quad CRUD |
| Phase 2.1 done | 11 | 7 | + Turtle ingest, regression fixtures |
| Phase 2.2 done | 21 | 13 | + dict cache, batched ingest, SPARQL parser, BGP-to-SQL, N-pattern BGP joins, user guide |
| Phase 2.2 (extended) step 6 | 56 | 19 | + FILTER, modifiers, OPTIONAL, UNION, MINUS |
| Phase 2.2 (extended) step 7 | 63 | 20 | + aggregates (COUNT/SUM/AVG/MIN/MAX + GROUP BY) |
| Phase 2.2 (extended) steps 8–12 | 79 | 25 | + HAVING, GROUP_CONCAT/SAMPLE, expression richness, BIND, multi-triple MINUS, ASK |
| v0.3 Phase 3 step 1 | 86 | 26 | + shmem dict cache (LLD §4.1), `pgrdf.stats()`, perf regression `50-shmem-dict-cache.sql` |
| v0.3 Phase 3 step 2 | 88 | 27 | + prepared-plan cache (LLD §4.2), parameterised SQL, perf regression `51-plan-cache.sql` |
| v0.3 Phase 3 step 3 phase A | 88 | 28 | + bulk-ingest prepared INSERT (LLD §4.3 phase A), `synth-10k.ttl`, perf regression `52-bulk-ingest-perf.sql`. 2× wall-clock target deferred to phase B / v0.4 |
| v0.3 Phase 4 | 91 | 29 | + `pgrdf.materialize` OWL 2 RL inference via `reasonable` 0.4, set-diff isolation, idempotent re-derivation, regression `60-materialize-owl-rl.sql` |
| v0.3 Phase 5 stub | 93 | 30 | + `pgrdf.validate(data, shapes)` JSONB stub. Real `shacl_validation` integration deferred — ERRATA E-009 (upstream iri_s/rdf-12 dep block). Regression `70-validate-stub.sql` |
| v0.3 Phase 6 step 1 | 93 | 30 | + regression suite wired into CI (`.github/workflows/ci.yml` `regression` job); compose builder + runtime on every PR. W3C runners + LUBM benchmarks remain deferred |
| v0.3 Phase 6 step 2 starter | 93 | 30+5 | + W3C-shape SPARQL harness — 5 starter tests in `tests/w3c-sparql/` wired into the CI regression job. Full W3C TTL-manifest runner deferred to v0.4 |
| v0.3 Phase 6 step 2 expanded | 93 | 30+13 | + 8 more W3C-shape tests covering FILTER, COUNT/HAVING, ORDER BY DESC, LIMIT/OFFSET, BIND/CONCAT, ASK true/false |
| v0.3 Phase 6 step 2 expanded II | 93 | 30+18 | + 5 more W3C-shape tests covering REGEX, IN, STRLEN, LANG, UCASE |
| v0.3 translator-gap signals + step 3 scaffold | 93 | 31+18+3 | + 8 negative regression signals (`80-unsupported-shapes.sql`) locking the error-message contract for unsupported SPARQL shapes; + 3 LUBM-shape correctness gates (`tests/perf/lubm-shape/`) against a hand-authored fixture |
| v0.3 +3 W3C essentials + integration | 93 | 32+21+3 | + 3 more W3C-shape tests (BOUND, STR(?iri), numeric FILTER); + `61-materialize-then-sparql.sql` integration test verifying inferred triples flow back through `pgrdf.sparql` |
| v0.3 stats shape contract | 93 | 33+21+3 | + `82-stats-shape.sql` locks the `pgrdf.stats()` JSONB field set, types, and value-range invariants — schema contract for downstream operator tooling |
| v0.3 translator fix — inline HAVING aggregate | 93 | 33+22+3 | `AggregateSpec.synth_aliases` preserves spargebra's intermediate variable name post-Extend rename; HAVING migration + translation consult both `output_var` and aliases. Negative `gap-1` removed; new positive test `22-having-inline-aggregate` covers `HAVING(SUM(?v) > c)` directly |
| v0.3 translator fix — type-aware MIN/MAX | 93 | 33+23+3 | `MIN`/`MAX` emit `COALESCE(MIN(numeric)::text, MIN(lex))` — numeric ordering on `xsd:numeric` literals, lex fallback for strings. New positive test `23-min-max-numeric` over `xsd:integer` |
| v0.3 error-path signals — #66 | 93 | 34+23+3 | + `81-error-paths.sql` opens a sibling track to `80`: locks the stable error-prefix UDFs emit on invalid input. Helper `_check_error` generalises `_check_gap` via `EXECUTE`. First check: `pgrdf.load_turtle()` against a missing path surfaces `load_turtle: failed to open` |
| v0.3 edge-case signals — #62 | 93 | 35+23+3 | + `62-materialize-empty.sql` opens an edge-case correctness track (slices 62 → forward) below the error-path track (66 → 63): `pgrdf.materialize()` on a zero-triple graph stays non-panicking, returns `base_triples = 0` + non-negative inferred-count, and remains idempotent across two calls (run 2's `previous_inferred_dropped` == run 1's `inferred_triples_written`). Axiomatic OWL 2 RL triple count NOT locked — that's upstream `reasonable` internals |
| v0.3 edge-case signals — #61 | 93 | 36+23+3 | + `63-shmem-reset-invalidation.sql` locks `pgrdf.shmem_reset()`'s shmem-cache invalidation contract: after `reset()` bumps the `GENERATION` atomic, re-parsing terms that were cached pre-reset (a) does NOT advance `shmem_hits` (slot-generation mismatch reads as cold) and (b) DOES advance `shmem_inserts` (fresh inserts replace the invalidated entries). Guards against a refactor of `src/storage/shmem_cache.rs::reset()` that forgets the generation bump and leaves stale dict ids visible across a `DROP EXTENSION; CREATE EXTENSION` cycle. Asserts deltas (not absolute counter values) via `\gset`-captured booleans so the expected output survives upstream churn |
| v0.3 edge-case signals — #60 | 93 | 37+23+3 | + `64-plan-cache-clear.sql` locks the returned-count semantics of `pgrdf.plan_cache_clear()`: fresh backend → 0 dropped, after N structurally distinct queries → N dropped (matches `plan_cache_local_size` snapshot taken pre-clear), `plan_cache_local_size` falls to 0 post-clear, second consecutive clear returns 0 (idempotent at zero). Guards against a refactor of `src/query/plan_cache.rs::plan_cache_clear()` that swaps `m.len()` for a constant, hoists the `len()` after `m.clear()` (always returning 0), or accidentally muddles the per-backend count with the cumulative shmem `plan_cache_inserts` counter. Empirical `size_before` on the current pgrx 0.16 / PG 17 build is 4 (1 ingest-side `flush_batch` INSERT plan + 3 SELECT plans), but the test locks the RELATION `drained = size_before AND size_after = 0 AND idempotent_clear = 0 AND size_before > 0` rather than the literal, so an ingest-path refactor that skips the plan cache leaves the test still passing |
| v0.3 edge-case signals — #59 | 93 | 38+23+3 | + `65-parse-turtle-empty.sql` locks the boundary contract of `pgrdf.parse_turtle()` on triple-free input: empty string, whitespace-only (`E'   \n   \t  '`), comment-only (`E'# c1\n# c2\n'`), and bare `@prefix` declaration all return `0` without panicking; `_pgrdf_quads` for the graph stays empty; `_pgrdf_dictionary` stays empty (interning happens INSIDE the per-triple loop body of `src/storage/loader.rs::ingest_turtle_with_stats`, so directives that emit zero triples emit zero dict writes). Orthogonal correct-path companion to the malformed-input case in `81-error-paths.sql` (which panics with the `load_turtle: turtle parse error: …` prefix): this slice locks that an EMPTY parser iterator is NOT a parse error — it returns `0` cleanly. Guards against a refactor that wraps the loop in a "fast-path" panicking on empty input, that seeds a placeholder dict/quad row, or that mishandles the trailing `flush_batch()` of zero-length arrays |
| v0.3 edge-case signals — #58 | 93 | 38+23+3 | + `tests/perf/smoke-ontologies.expected.tsv` locks the per-ontology triple counts emitted by `tests/perf/smoke-ontologies.sh` across the current 24-ontology W3C/Apache-Jena/ValueFlows/ConceptKernel-v3.7 set (workflow.ttl held out per ERRATA E-007); snapshot today is **24 rows / 17,134 triples total**. New `tests/perf/smoke-ontologies.sh --check` mode re-runs the smoke, regenerates a TSV from the live output, and `diff -u`'s it against the lock-file (exit non-zero on any drift). Catches two regression classes invisible to the bare smoke: an ontology that used to parse stops parsing (row disappears) and the parser silently drops/duplicates triples (count moves). Not gated in CI yet — `fixtures/ontologies/*.ttl` is gitignored, so the smoke can only run locally after `fixtures/ontologies.sh`; a follow-on Phase 6 slice wires `--check` once a CI fetch step lands. Default smoke behaviour (no flag → pretty-print, exit 0) unchanged. NOT a pg_regress file — test bar unchanged at 38+23+3 |
| v0.3 edge-case signals — #57 | 93 | 39+23+3 | + `66-parse-sparql-roundtrip.sql` locks the end-to-end round-trip from `pgrdf.parse_turtle` ingest through `pgrdf.sparql` query: every triple the parser saw MUST be observable via the SPARQL executor across all four object-term kinds plus a blank-node subject. Five `bool_and(EXISTS …)` assertions over a single 5-shape Turtle fragment cover (1) IRI object (`foaf:knows`), (2) plain literal (`foaf:name "Alice"`), (3) typed literal (`ex:age "30"^^xsd:integer`), (4) language-tagged literal (`ex:bio "Engineer"@en`), and (5) blank-node subject — keyed by a sibling-property join `?s foaf:name "Anon" . ?s foaf:name ?n` so the parser-allocated bnode id stays out of the assertion. Sibling to `61-materialize-then-sparql.sql` (which locks the materialize→sparql edge); together they pin both ends of the storage layer's visibility contract to the SPARQL surface. Datatype URI and lang-tag echo policy are NOT pinned by this slice (the SPARQL projection emits the lexical only); their storage-side contracts are locked by `21-typed-literals.sql` / `22-lang-tags.sql` |
| v0.3 edge-case signals — #56 | 93 | 39+23+3 | extends `82-stats-shape.sql` in-place (no new pg_regress file — the file is explicitly scoped to "schema shape only" and these three new invariants are schema shape too) with the schema-drift tripwire trio: (a) exact field count — `count(*) FROM jsonb_object_keys(stats()) = 10` pins to the literal current key count emitted by `src/storage/stats.rs::stats()` (`shmem_ready`, `shmem_slots`, `shmem_hits`, `shmem_misses`, `shmem_inserts`, `shmem_evictions`, `plan_cache_hits`, `plan_cache_misses`, `plan_cache_inserts`, `plan_cache_local_size`) so any added field forces a deliberate test update; (b) keys-match-canonical — `array_agg(k ORDER BY k) = ARRAY[…literal 10-element list…]` catches both silent additions (array gets longer) and silent renames (one element swaps); (c) no-null-fields — `bool_and(jsonb_typeof(value) != 'null')` catches a refactor that defaults an uninitialised counter to JSON `null` rather than `0`. Companions the existing "fields-that-SHOULD-be-there are there" block with the orthogonal "fields-that-SHOULDN'T-be-there ARE NOT there" guarantee — together they pin the closed-set shape contract downstream operator tooling (CloudNativePG operators, CI dashboards, telemetry parsers) wires against. Test count unchanged: still 39+23+3 — three new rows in `tests/regression/expected/82-stats-shape.out` |
| v0.3 harness lock-in — #55 | 93 | 39+23+3 | promotes the W3C-shape + LUBM-shape harnesses to first-class Justfile recipes (`just test-w3c`, `just test-lubm`), introduces `just test-conformance` (regression + W3C-shape + LUBM-shape — every compose-based layer) and `just test-everything` (pgrx integration + test-conformance — the broadest sweep), and lands `just smoke-cold` (`compose-down` → `build-ext` → `compose-up` → `CREATE EXTENSION` → test-conformance) as the cold-compose discipline gate. `just test-all` keeps its narrow `test + test-regression` shape for back-compat. `docs/08-testing.md` and `README.md`'s Tests block point at the new entry points. The shift matters because two of the three compose-based harnesses were previously discoverable only by knowing the bash paths — `just --list` showed nothing about them, and `just test-all` silently skipped them. Cold-compose smoke is the verification half: it catches the bug class that passes on a warm compose because some prior `DROP/CREATE` left state behind, and breaks on the next cold boot. Test count unchanged — the new recipes are wrappers, not new tests. Final entry in the 66→1 coverage countdown; the next phase opens the hygiene cycle |
| **v0.3 cut (current)** | **93** | **39 + 23 + 3 = 65** | **Total 158 tests across all five layers** (93 pgrx integration + 39 pg_regress + 23 W3C-shape SPARQL + 3 LUBM-shape). v0.3 LLD §5 phase status: Phase 1 ✅, Phase 2 ✅ (2.0/2.1/2.2 + extended SPARQL surface steps 1-12), Phase 3 🚧 (steps 1-2 ✅, step 3 phase A ✅, phase B → v0.4), Phase 4 ✅, Phase 5 🚧 stub (real impl → v0.5 per v0.4-FUTURE §9), Phase 6 🚧 (step 1 ✅, step 2 starter + expansions + essentials ✅, step 3 ⏳). License attribution (Apache 2.0 / 2026), MSRV (1.91), ERRATA E-006 re-check (2026-05-14), ERRATA E-010 (cargo audit informational). Forward look: [`SPEC.pgRDF.LLD.v0.4-FUTURE.md`](../specs/SPEC.pgRDF.LLD.v0.4-FUTURE.md) is canonical for v0.4 scope |