# Changelog
All notable changes to pgRDF are tracked here. Format follows
[Keep a Changelog](https://keepachangelog.com/). Versioning is SemVer
once we cut v1.0; pre-1.0 minor bumps may include breaking changes.
## [Unreleased]
(No entries yet β v0.4 work begins after this point.)
## [0.3.0] β 2026-05-14
The first official pgRDF release. Ships the v0.3 engine surface
feature-complete state: storage (dictionary-encoded terms,
LIST-partitioned quads, hexastore indexes), SPARQL SELECT/ASK
surface, OWL 2 RL inference, SHACL validation stub, storage
performance (shmem dict cache + prepared-plan cache + prepared
bulk-insert), 158 automated tests + 24-ontology smoke, License
attribution + MSRV declared, and the full release pipeline
exercised end-to-end. PG 14-17 across {amd64, arm64} = 8
prebuilt tarballs.
### Release pre-flight β final CI sweep (slices #10-#5)
Last verification gate before tagging v0.3.0. Ran every test/lint
layer locally against the post-version-bump tree (HEAD was `ac514fe`
at start of sweep) to confirm the codebase is release-ready.
Per-layer results:
- **Slice #10 β `cargo fmt --check`**: drift in
`src/inference/reasonable.rs` (40 lines reflowed; long `.expect()`
chains broken across multiple lines by rustfmt). Applied
`cargo fmt --all` and committed as a style-only commit.
- **Slice #9 β `cargo clippy -D warnings`** (in builder container,
rustc 1.91): one error surfaced β unused import
`use pgrx::prelude::*;` at `src/storage/shmem_cache.rs:349`. The
`use super::*;` on the line above already brings `pgrx::prelude::*`
into scope through the parent module, so the line was genuinely
redundant. Removed it. Clippy now exits 0.
- **Slice #8 β `just test`** (pgrx integration in Linux builder):
`test result: ok. 93 passed; 0 failed; 0 ignored`. Matches the
93-count in README and `docs/08-testing.md`; no doc drift.
- **Slice #7 β `just test-regression`** (pg_regress harness against
compose Postgres): `39 pass, 0 fail, 0 new baselines`.
- **Slice #6 β `just test-w3c`** (W3C-shape SPARQL harness):
`23 pass, 0 fail, 0 new baselines`.
- **Slice #5 β `just test-lubm`** (LUBM-shape harness):
`3 pass, 0 fail, 0 new baselines`.
Final aggregate: **93 (pgrx) + 39 (pg_regress) + 23 (W3C) + 3 (LUBM)
= 158 tests, all green**. Matches the 158 figure cited across
README.md, `docs/08-testing.md`, `docs/10-roadmap.md`,
`docs/09-release.md`, and `RELEASE_NOTES.md` β no test-count doc
updates needed.
Code changes from this sweep:
- `src/inference/reasonable.rs` β rustfmt-only reflow (no semantic
change), landed in its own commit.
- `src/storage/shmem_cache.rs` β removed redundant
`use pgrx::prelude::*;` from `mod tests`.
### Release pre-flight β version bump to 0.3.0 (slices #18-#11)
Mechanical version bump landing slices #18 through #11 of the 66β1
release countdown. Touches every surface that pins the package
version. The `pgrdf.version()` UDF (which returns
`env!("CARGO_PKG_VERSION")`) and the extension's `extversion` now
both report `0.3.0`.
Files touched:
- `Cargo.toml` β `version = "0.2.0"` β `version = "0.3.0"` (slice #18).
- `pgrdf.control` β `default_version = '0.2.0'` β `'0.3.0'` (slice #17).
- `compose/compose.yml` β bind-mount path
`pgrdf--0.2.0.sql` β `pgrdf--0.3.0.sql` (slice #16).
- `compose/README.md` β bind-mount table + `pgrdf.version()` worked
example output β `"0.3.0"` (slice #16).
- `README.md` β `pgrdf.version()` example output β `0.3.0`
(slices #15/#14; status pill at `v0.3` was already correct).
- `guide/01-install.md` β `pgrdf.version()` example outputs (Path A
compose flow, Verify section) β `0.3.0`; Path C manual-install
worked-example tarball URL β `v0.3.0/pgrdf-0.3.0-...tar.gz`
(slice #13).
- `docs/02-storage.md` β "No PostgreSQL custom scan hooks at v0.3.0"
current-version reference (slice #13).
- `docs/06-installation.md` β `pgrdf.version()` example output β
`'0.3.0'` (slice #13).
- `docs/09-release.md` β preamble reframed to reflect that Cargo.toml
now reads `version = "0.3.0"` (bump landed), instead of "still
reads `version = "0.2.0"`; bump-to-0.3.0 happens as part of the
cut" (slice #13).
- `tests/regression/expected/00-smoke.out` β pgrdf.version() and
extversion lines `0.2.0` β `0.3.0` (slice #13).
- `Cargo.lock` β `pgrdf 0.2.0 β 0.3.0` via `cargo update -p pgrdf`
(slice #11).
Build artifact verification (slice #12):
- `just build-ext` produces
`compose/extensions/share/extension/pgrdf--0.3.0.sql` and
`pgrdf.control` reads `default_version = '0.3.0'`. The cached
`pgrdf--0.2.0.sql` left over in the build output was removed (v0.x
doesn't support `ALTER EXTENSION pgrdf UPDATE` per the slice #21
upgrade policy, so the legacy migration file isn't needed).
- `just test-regression` reports `39 pass, 0 fail, 0 new baselines`.
Historical references to `0.2.0` are intentionally preserved:
- All `CHANGELOG.md` `[Unreleased]` entries from slices #24-#27 that
document past pre-flight verifications (manual repack test,
cargo pgrx package dry-run, etc.) β they accurately describe the
state at the time those slices ran.
- `sql/schema_v0_2_0.sql` β historical bootstrap-schema filename
(still referenced by `extension_sql_file!` in `src/lib.rs`); v0.3
doesn't change the schema layout.
- `specs/SPEC.pgRDF.LLD.v0.2.md` β the v0.2 LLD contract.
CHANGELOG `[Unreleased]` β `[0.3.0]` block conversion stays deferred
to slices #4-#1 (the cut itself).
### Release notes β v0.4 deferral list audit (slice #19)
Bi-directional audit of the "Deferred to v0.4" lists in
`RELEASE_NOTES.md` and `docs/09-release.md` against
`specs/SPEC.pgRDF.LLD.v0.4-FUTURE.md` Β§2 (canonical v0.4 scope). The
spec lists five major tracks (Β§3 named-graph + IRI mapping; Β§4 SPARQL
UPDATE; Β§5 graph-level lifecycle UDFs; Β§6 CONSTRUCT; Β§7 property
paths) plus the carried SPARQL backlog (Β§11: multi-triple OPTIONAL,
VALUES, BIND-downstream, aggregates over UNION, DESCRIBE) and the
ingest-performance carry (Β§12: `heap_multi_insert` 2Γ target).
Drift findings, both directions:
- **`docs/09-release.md` was missing the entire Β§11 SPARQL backlog** β
multi-triple OPTIONAL, VALUES, BIND-downstream, aggregates over
UNION, and DESCRIBE were absent from its "Deferred to v0.4" list
even though they are listed in `RELEASE_NOTES.md` and named
explicitly in the spec as in-scope for v0.4. Fixed by adding a
dedicated bullet covering all five, with the LLD Β§11 cross-link.
- **`RELEASE_NOTES.md` mentioned `GRAPH { β¦ }` without the IRI
mapping** β the IRI β `graph_id` mapping table is the hard
prerequisite for the SPARQL surface (LLD Β§3.1), and `docs/09-release.md`
already names it. Fixed by adding "with IRI β `graph_id` mapping"
to the named-graph entry plus the property-path operator set, and a
`Β§2` anchor on the LLD cross-link for parity with `docs/09-release.md`.
- **No v0.5/v1.0 items mislabeled as v0.4** β the LLD's Β§8
(reasoning profile selector), Β§9 (real SHACL output, gated on
E-009), Β§10 (TriG / N-Quads ingest), and Β§15 (incremental
materialisation, RDF 1.2 triple terms) are correctly absent from
both consumer-facing files. A short pointer paragraph added to
`docs/09-release.md` so readers know where the v0.5/v1.0 forward
look lives (LLD Β§8-Β§10 and Β§15) rather than guessing those items
were forgotten.
- **Items in consumer-facing lists not in LLD Β§2**: `SHA256SUMS.asc`
GPG signature and pgrx 0.18 / PG 18 migration are both legitimate
v0.4 work items but are release-engineering and toolchain concerns
outside the LLD scope (tracked under INSTALL OQ4 / roadmap Phase 6
step 3, and ERRATA E-006 respectively). Annotated as such in
`docs/09-release.md`; left in place in both files because they're
user-visible v0.4 deliverables consumers should see in release
notes.
- **Cross-link anchors**: both files link to the LLD doc top (no
in-document anchor); the file resolves on disk. Added an explicit
`Β§2` reference in both prose pointers so a future reader can find
the canonical scope section without scrolling.
Outcome: drift closed in both directions. `RELEASE_NOTES.md`'s
"Deferred to v0.4" line now reads parallel to the LLD Β§2 capability
matrix; `docs/09-release.md` no longer drops the SPARQL backlog. No
spec edits; the LLD remains source of truth. This audit slice is
documentation-only β no code, no tests, no other docs touched.
### Release notes β known issues block consolidated (slice #20)
Cross-file audit of the Known Issues surface across `RELEASE_NOTES.md`,
`docs/09-release.md`, and `specs/ERRATA.v0.2.md` to make sure
consumer-facing release docs cite the v0.3.0-era errata consistently
with the authoritative ERRATA table. Triggered by the E-007
"workflow.ttl" mis-cite caught in slice #22 (corrected in `52c13bf`);
this pass verifies nothing else drifted across the remaining v0.3.0
errata entries (E-006, E-007, E-008, E-009, E-010) and that the
pre-v0.3 entries (E-001, E-002, E-003, E-004, E-005) are correctly
omitted from the v0.3.0 release surface as resolved-by-design or
out-of-scope-for-consumers.
Findings, per E-NNN:
- **E-001** (`shacl-rust` β `shacl_validation` supersession), **E-002**
(OWL 2 RL only, EL/QL out-of-scope), **E-003** (PG 18 GUC path,
effectively rolled into E-006), **E-004** (init-script-on-PG18+ no
longer needed; compose has no init script), **E-005** (repo URL
`styk-tv/pgRDF` placeholder fix) β **all pre-v0.3 spec corrections
folded into the v0.3 LLD body or otherwise resolved-by-design.**
Correctly omitted from v0.3.0 release notes in both files; no
consumer-facing impact on the v0.3.0 tarball.
- **E-006** (pgrx 0.18 / PG 18 deferred) β cited in both files,
consistent: `RELEASE_NOTES.md` "pgrx 0.18 / Postgres 18 deferred to
v0.4."; `docs/09-release.md` "pgrx held at 0.16.1; PG 18 deferred
to v0.4." Both match ERRATA's "Hold pgrx 0.16.1 for v0.3. Support
matrix: PG 14β17."
- **E-007** (`extension_control_path` GUC forward path blocked by
E-006) β cited in both files, consistent: both call out INSTALL Β§7,
the E-006 blocker, and the per-file bind-mount workaround. The
earlier "workflow.ttl" mis-cite from slice #22 is gone (fixed in
`52c13bf`).
- **E-008** (Linux builder container instead of native macOS) β
**correctly omitted from both consumer-facing files.** This is a
contributor / build-environment fact, not a tarball-consumer fact;
end-users of the release artifacts never encounter the dev-only
macOS β Linux builder routing. Listed in ERRATA for the dev path.
- **E-009** (SHACL real integration blocked upstream) β cited in
both files, consistent: `RELEASE_NOTES.md` "SHACL real integration
blocked by upstream dep conflict."; `docs/09-release.md`
"`pgrdf.validate` ships as a stub; real SHACL execution blocked by
upstream `shacl_validation` / `reasonable` feature unification."
Both match ERRATA's `iri_s` migration + `rdf-12` feature-unification
story.
- **E-010** (cargo audit informational advisories) β cited in both
files, consistent: `RELEASE_NOTES.md` "cargo audit advisories β all
informational, no security impact."; `docs/09-release.md` "4
informational `cargo audit` advisories accepted for v0.3 (all in
subtrees of pgrx 0.16.1 / `reasonable 0.4.1` and clear automatically
when E-006 / E-009 resolve)." Both match ERRATA's "Accept the 4
informational warnings for v0.3."
Cross-link verification: `specs/ERRATA.v0.2.md` resolves on disk from
both consumer-facing files (`RELEASE_NOTES.md` root-relative;
`docs/09-release.md` via `../specs/ERRATA.v0.2.md`).
**Outcome: zero drift. No edits to `RELEASE_NOTES.md` or
`docs/09-release.md` required.** The two files cite exactly the same
set (E-006, E-007, E-009, E-010), describe each at appropriate
granularity for their audience (marketing-style summary vs engineering
release note), and classify all four as known-issues with no
security-blocking impact. ERRATA remains the source of truth; both
consumer-facing docs are aligned to it. This audit slice is
documentation-only β no code, no spec, no other doc touched.
### Release notes β upgrade policy documented (slice #21)
The v0.x upgrade discipline written down as consumer-facing contract.
Headline: pgRDF v0.x reserves the right to break schema and UDF
signatures between minor releases, `ALTER EXTENSION pgrdf UPDATE` is
not supported until v1.0, and the supported upgrade path is dump-via-SQL
(decode `pgrdf._pgrdf_quads` against `pgrdf._pgrdf_dictionary` per
graph, serialise to Turtle externally), `DROP EXTENSION pgrdf CASCADE`,
install the new version, then `CREATE EXTENSION` + re-load. v1.0 is
flagged as the boundary where proper `ALTER EXTENSION pgrdf UPDATE`
migrations land alongside a frozen on-disk schema; no date committed.
The detailed policy lives in `docs/06-installation.md` as a new
`## Upgrade between v0.x versions` section (procedure with SQL dump
template, "why no in-place upgrade?" rationale calling out fluid
pre-1.0 schema / non-stable dict id space / `is_inferred` flux, and
cluster-managed-installation guidance pointing CloudNativePG /
StackGres / Apache AGE operators at planned maintenance windows +
volume snapshots + staging verification). Two short cross-link
summaries land alongside: `docs/09-release.md` v0.3.0 section gets a
new `### Upgrade policy` subsection above `### Known issues`, and
`RELEASE_NOTES.md` gets a new `## Upgrading` section above
`## License`. Both summaries point back at the canonical
`docs/06-installation.md` anchor. SQL examples schema-qualify the
internal tables (`pgrdf._pgrdf_quads` / `pgrdf._pgrdf_dictionary`)
matching the actual extension schema. No fabricated dates for v0.4 /
v1.0.
### Release notes β RELEASE_NOTES.md drafted + release.yml body_path wired (slice #22)
The GitHub Release body for v0.3.0 β the marketing-style summary that
consumers see in the GH UI when they land on the release page. Lives at
the repo root as `RELEASE_NOTES.md` (Option A per the slice brief: simple,
conventional, rewritten each release). Wired into the workflow via
`body_path: RELEASE_NOTES.md` on the `softprops/action-gh-release@v2`
step alongside the existing `generate_release_notes: true` (GH appends
the auto-generated PR-title commit list under the curated body).
Content is consumer-facing, ~370 words: a one-line elevator pitch,
the feature surface (storage / Turtle / SPARQL SELECT-ASK / OWL 2 RL /
SHACL stub / performance), the consolidated test bar (158 automated +
24 manual smoke matching `docs/09-release.md`), a drop-in install
recipe with the exact `curl` / `sha256sum -c` / `cp` flow, the docker
compose pointer, the {pg14..pg17}Γ{amd64, arm64} support matrix,
known issues (E-006 / E-007 / E-009 / E-010), the v0.4 deferral list,
and Apache 2.0 attribution. Every relative path (`specs/SPEC.pgRDF.INSTALL.v0.2.md`,
`guide/01-install.md`, `specs/ERRATA.v0.2.md`,
`specs/SPEC.pgRDF.LLD.v0.4-FUTURE.md`, `CHANGELOG.md`) resolves on disk.
Numbers cross-checked against the engineering release note in
`docs/09-release.md` (slice #23); no new claims.
### Release notes β docs/09-release v0.3.0 section drafted (slice #23)
Replaces the "No release cut yet" preamble in `docs/09-release.md`
with a `## v0.3.0 β 2026-05-14 (planned)` section: engine surface
recap (storage / SPARQL / Phase 3 perf / Phase 4 inference / Phase 5
stub / Phase 6 CI), the consolidated test bar (93 pgrx + 39 pg_regress
+ 23 W3C-shape + 3 LUBM-shape + 24 ontology smoke / 17 134 triples),
performance characteristics (sub-Β΅s dict cache hit, prepared-plan
reuse, phase A bulk ingest with the 2Γ wall-clock target carried to
v0.4 phase B), the {pg14..pg17}Γ{amd64, arm64} matrix with PG 18
deferred per E-006, license + attribution (Apache 2.0, Copyright 2026
Peter Styk, LICENSE + NOTICE inside every tarball per Β§4(d)), MSRV
1.91, tarball layout per INSTALL Β§3, known issues (E-006 / E-007 /
E-009 / E-010), and the v0.4 deferral list. Content sourced from
CHANGELOG `[Unreleased]` and LLD v0.3 Β§2βΒ§7; no new claims, no
fabricated numbers.
The CHANGELOG `[Unreleased]` block is **not yet cut** to a
`[0.3.0] β YYYY-MM-DD` block β that move lands in a later slice
(group 4-1, the actual tag commit). This slice only drafts the
engineering-side narrative inside `docs/09-release.md`; the
GitHub Release body itself is a separate slice (#22).
### Release pre-flight β smoke-install verification (slice #24)
End-to-end install rehearsal of the slice #25 tarball
(`pgrdf-0.2.0-pg17-glibc-arm64.tar.gz`, 872,067 B) against a clean
`postgres:17.4-bookworm` container. This exercises the real consumer
install path the GH Release will ask users to perform: extract tarball,
drop artifacts into PG library paths, `CREATE EXTENSION pgrdf`, parse
Turtle, run SPARQL. Goal: confirm the v0.3.0 release artifact is
ship-ready.
**Procedure (Option A from the slice brief β fresh container, no
compose, bind-mount FROM staged tarball contents):**
```bash
STAGING=$PWD/.smoke-install-test # podman-visible
rm -rf "${STAGING}" && mkdir -p "${STAGING}"
tar -xzf /tmp/pgrdf-repack-test/pgrdf-0.2.0-pg17-glibc-arm64.tar.gz \
-C "${STAGING}"
podman run -d --name pgrdf-smoke \
-e POSTGRES_USER=pgrdf -e POSTGRES_PASSWORD=pgrdf -e POSTGRES_DB=pgrdf \
-v "${STAGING}/pgrdf-0.2.0-pg17-glibc-arm64/lib/pgrdf.so:/usr/lib/postgresql/17/lib/pgrdf.so:ro" \
-v "${STAGING}/pgrdf-0.2.0-pg17-glibc-arm64/share/extension/pgrdf.control:/usr/share/postgresql/17/extension/pgrdf.control:ro" \
-v "${STAGING}/pgrdf-0.2.0-pg17-glibc-arm64/share/extension/pgrdf--0.2.0.sql:/usr/share/postgresql/17/extension/pgrdf--0.2.0.sql:ro" \
-p 5433:5432 docker.io/library/postgres:17.4-bookworm \
-c shared_preload_libraries=pgrdf
```
Container `pgrdf-smoke` runs on port 5433 to avoid colliding with the
regular compose container on 5432. Postgres `pg_isready` returned ok
after 2s. Bind-mounts are read-only β exercises the real distribution
shape (immutable artifact, mutated only at boot via Postgres config
args).
**Initial environment note (caught by the smoke):** the slice brief's
`podman run` snippet omitted `-c shared_preload_libraries=pgrdf`. On
first run, `CREATE EXTENSION` + `pgrdf.version()` succeeded but the
first stateful call (`parse_turtle`) returned `ERROR: PgAtomic was
not initialized` β the canonical signature of pgRDF not being loaded
via `shared_preload_libraries`. This is documented in
[`SPEC.pgRDF.INSTALL.v0.2 Β§6 + Β§7`](specs/SPEC.pgRDF.INSTALL.v0.2.md),
[`guide/01-install.md Β§3`](guide/01-install.md), and
[`docs/06-installation.md Β§1.2`](docs/06-installation.md), so the
diagnostic chain held: error β check `SHOW shared_preload_libraries`
(empty) β re-launch with `-c shared_preload_libraries=pgrdf` β SHOW
returns `pgrdf` β everything works. **The tarball + install docs are
both correct; only the smoke brief's `podman run` command was
incomplete.**
**Smoke test results (after relaunching with the preload arg):**
| Step | Command | Output | Verdict |
|---|---|---|---|
| 1 | `CREATE EXTENSION pgrdf;` | `CREATE EXTENSION` | OK |
| 2 | `SELECT pgrdf.version();` | `0.2.0` (1 row) | OK |
| 3 | `SELECT pgrdf.add_graph(1);` | `t` (1 row) | OK |
| 4 | `SELECT pgrdf.parse_turtle('@prefix ex: <http://example.org/> . ex:a ex:b ex:c .', 1);` | `1` (1 triple inserted) | OK |
| 5 | `SELECT * FROM pgrdf.sparql('SELECT ?o WHERE { ?s ?p ?o }');` | `{"o": "http://example.org/c"}` (1 row) | OK |
End-to-end round-trip: tarball β bind-mount β `CREATE EXTENSION` β
parse Turtle β SPARQL SELECT, all on a stock upstream Postgres image
with zero source build. The 872 KiB artifact is sufficient.
**Total elapsed (stage β CREATE EXTENSION β final SPARQL β teardown):
~50s** on M-series darwin, podman 4-ARM hypervisor.
**Bind-mount caveats discovered:**
- Podman on darwin runs in a VM (applehv) that does NOT auto-mount
`/tmp`. First attempt staged the tarball under `/tmp/pgrdf-install-test`
per the slice brief and `podman run` returned
`statfs /tmp/.../pgrdf.so: no such file or directory`. Restaging
under `$PWD/.smoke-install-test` (under `/Users`, auto-mounted by
podman's machine config) resolved it. **Implication for the public
install guide:** the existing guide already tells users to write
artifacts to PG's actual `pkglibdir` + `sharedir` (via
`pg_config --pkglibdir`), not to bind-mount from `/tmp`, so this
is a smoke-test infrastructure quirk only β not a documentation gap.
- glibc version mismatch was a worry going in (tarball built on glibc
2.36 from the slice #26 manylinux container, smoke runs against
bookworm's glibc 2.36) β and it was a non-event, since the build +
smoke happen to use the same glibc minor. Cross-glibc verification
is still owed once aarch64 + amd64 release builds land on the real
GH runner.
**Teardown:**
```bash
podman rm -f pgrdf-smoke
rm -rf "${STAGING}" # $PWD/.smoke-install-test
```
`pgrdf-smoke` is a one-shot container, never persisted; the regular
`pgrdf-postgres` (from `just compose-up`) is unaffected β it was
stopped at start of the smoke and can be restarted by the user with
`just compose-up` at any time.
**Slice outcome: PASS.** The v0.3.0 tarball-shaped release artifact
installs cleanly into a stock Postgres image. The pre-flight group
(slices #66 β #24, 43 entries) closes here. Remaining slices #23 β #1
shift focus to feature work in the v0.3.0 / v0.4.0 scope (per the
roadmap in slice #29).
### Release pre-flight β manual tarball repack verification (slice #25)
Manually executed the `release.yml` repack step (lines 38-53) on the
slice #26 build artifacts to confirm the staging β tarball pipeline
produces the exact INSTALL Β§3 layout the GH Release would publish.
Goal: catch any tar/find/sha256sum corner case *before* the tagged
release runs across 4 PG majors x 2 arches.
**Procedure (aarch64 darwin, slice #26 artifacts re-used β no rebuild):**
```bash
VER=0.2.0; PG=17; ARCH=arm64
STAGING=/tmp/pgrdf-repack-test
OUT="${STAGING}/pgrdf-${VER}-pg${PG}-glibc-${ARCH}"
rm -rf "${STAGING}"
mkdir -p "${OUT}/lib" "${OUT}/share/extension"
cp compose/extensions/lib/pgrdf.so "${OUT}/lib/"
cp compose/extensions/share/extension/pgrdf.control "${OUT}/share/extension/"
cp compose/extensions/share/extension/pgrdf--${VER}.sql "${OUT}/share/extension/"
cp LICENSE NOTICE "${OUT}/"
( cd "${OUT}" && find . -type f ! -name SHA256SUMS -print0 \
| xargs -0 sha256sum > SHA256SUMS )
tar -czf "${STAGING}/$(basename ${OUT}).tar.gz" -C "${STAGING}" "$(basename ${OUT})"
```
Mirrors `release.yml` L41-52 byte-for-byte (substituting
`compose/extensions/{lib,share}` for `target/release/pgrdf-pg17/usr/{lib,share}/postgresql/17/`,
since slice #26 already copied those out β same files, different path
prefix). GNU `sha256sum` from coreutils 9.7 is installed on darwin via
brew so the `sha256sum > SHA256SUMS && sha256sum -c SHA256SUMS` flow
matches Ubuntu runner behaviour exactly.
**Tarball produced:**
| Field | Value |
|---|---|
| Name | `pgrdf-0.2.0-pg17-glibc-arm64.tar.gz` |
| Size | 872,067 B (852 KiB) |
| Contents | 6 files + 4 dirs |
| Compression ratio | ~2.6:1 vs `pgrdf.so` (2.2 MB β 870 KB) |
**`tar -tzf | sort` (every entry, in lexicographic order):**
```
pgrdf-0.2.0-pg17-glibc-arm64/
pgrdf-0.2.0-pg17-glibc-arm64/LICENSE
pgrdf-0.2.0-pg17-glibc-arm64/NOTICE
pgrdf-0.2.0-pg17-glibc-arm64/SHA256SUMS
pgrdf-0.2.0-pg17-glibc-arm64/lib/
pgrdf-0.2.0-pg17-glibc-arm64/lib/pgrdf.so
pgrdf-0.2.0-pg17-glibc-arm64/share/
pgrdf-0.2.0-pg17-glibc-arm64/share/extension/
pgrdf-0.2.0-pg17-glibc-arm64/share/extension/pgrdf--0.2.0.sql
pgrdf-0.2.0-pg17-glibc-arm64/share/extension/pgrdf.control
```
**SHA256SUMS contents (5 lines, SHA256SUMS itself absent β self-exclude OK):**
```
a6dc47dea368e1cb479f456538144939060fa72bb2a96c4eabf23477d1a5ece8 ./LICENSE
7ee0daa51a51f29729f80e96192b6df4874b02a39f131c34f486c5365b3726c8 ./NOTICE
c8c661eada2255fa85e441a50240c3eaad4e1c12197a2102a1554bf5574ab90c ./lib/pgrdf.so
7584c499464333b53dc7bd106aafd37ffa5071cb33980bd5214e6da8c72284b4 ./share/extension/pgrdf.control
3a785b2b483bd510ecf810af029bb47cd8dab032071c548f3735e759564e7f69 ./share/extension/pgrdf--0.2.0.sql
```
`pgrdf.so` SHA matches slice #26's recorded `c8c661eaβ¦ab90c` β same
binary, no rebuild, by construction.
**Round-trip verify (extract fresh, `sha256sum -c`):**
```
./LICENSE: OK
./NOTICE: OK
./lib/pgrdf.so: OK
./share/extension/pgrdf.control: OK
./share/extension/pgrdf--0.2.0.sql: OK
```
5 of 5 OK. SHA256SUMS does not appear in its own manifest β the
`find -type f ! -name SHA256SUMS` predicate works as intended.
**INSTALL Β§3 layout conformance:**
| INSTALL Β§3 entry | tarball entry | match |
|---|---|---|
| `lib/pgrdf.so` | `lib/pgrdf.so` | exact |
| `share/extension/pgrdf.control` | `share/extension/pgrdf.control` | exact |
| `share/extension/pgrdf--<version>.sql` | `share/extension/pgrdf--0.2.0.sql` | exact |
| `share/extension/pgrdf--<prev>--<version>.sql` (zero or more) | (none β 0.2.0 is the first cut) | n/a |
| `LICENSE` | `LICENSE` | exact |
| `SHA256SUMS` | `SHA256SUMS` | exact |
| (none) | `NOTICE` | **spec gap β see below** |
Byte-for-byte conformant with INSTALL Β§3 except for `NOTICE`, which
landed in the tarball via slice #28 (Apache 2.0 Β§4(d) compliance) but
the corresponding INSTALL Β§3 file list was not updated then. Adding
`NOTICE` to the spec's enumerated layout is a one-line surface edit
deferred to a separate spec-grooming slice β the tarball mechanics
themselves are correct.
**Aggregate SHA256SUMS (release.yml L67-72) β not verified this slice:**
The aggregate step runs over `pgrdf-*.tar.gz` produced by all
`(pg, arch)` matrix legs and lives in the `release` job. With only
one local tarball it'd be a single-line file; the multi-leg
aggregation is fundamentally a GH Actions concern (artifact upload +
`download-artifact merge-multiple: true`). Single-tarball spot check
mirrors the same `sha256sum pgrdf-*.tar.gz > SHA256SUMS` invocation β
no behavioural surprise expected.
**Why this matters:**
`release.yml` is a single-shot, tag-triggered workflow. A failure
inside the `Repack to INSTALL-spec layout` step would leave a
half-published release on GitHub with no artifacts attached. Dry-running
the repack locally on the same artifact tree the workflow consumes
catches `cp` glob mismatches, `find -print0` portability surprises,
and tar layout regressions before they cost a re-tag + force-push.
Combined with slice #26 (path mapping verified) and slice #27 (verify
docs + GPG defer): the v0.2.0 cut is end-to-end traced.
Status: repack mechanics verified end-to-end on aarch64 darwin. The
GH Actions runner uses identical GNU coreutils + GNU tar, so the
behaviour transfers. The lone surface gap is `NOTICE` missing from
INSTALL Β§3's file list β flagged for a follow-up spec edit, no
runtime impact.
### Release pre-flight β cargo pgrx package dry-run (slice #26)
Verified `cargo pgrx package` produces the artifact tree the
`release.yml` repack step (lines 38-53) consumes. Goal: catch any
path mismatch *before* `git push --tags` triggers the real workflow
across 4 PG majors x 2 arches.
**Procedure (Colima docker builder, aarch64):**
1. `rm -rf compose/extensions/{lib,share}` to clean state.
2. `DOCKER_BUILDKIT=1 docker build --target builder --no-cache
-t pgrdf-builder-rust:pg17 -f compose/builder.Containerfile .`
β forces a fresh `cargo pgrx package` run (busts cargo cache).
3. `just build-ext` β completes export-stage and copies artifacts
to `compose/extensions/`.
**Live `cargo pgrx package` output (from step 2, builder stage 7/7):**
```
Installing extension
Copying control file to target/release/pgrdf-pg17/usr/share/postgresql/17/extension/pgrdf.control
Copying shared library to target/release/pgrdf-pg17/usr/lib/postgresql/17/lib/pgrdf.so
Writing SQL entities to /work/target/release/pgrdf-pg17/usr/share/postgresql/17/extension/pgrdf--0.2.0.sql
Finished installing pgrdf
```
**Path mapping (cargo pgrx package output β release.yml expectation):**
| cargo pgrx package emits | release.yml repack reads | match |
|---|---|---|
| `target/release/pgrdf-pg17/usr/lib/postgresql/17/lib/pgrdf.so` | `${PKG}/usr/lib/postgresql/${pg}/lib/*.so` (L46) | exact |
| `target/release/pgrdf-pg17/usr/share/postgresql/17/extension/pgrdf.control` | `${PKG}/usr/share/postgresql/${pg}/extension/*.control` (L47) | exact |
| `target/release/pgrdf-pg17/usr/share/postgresql/17/extension/pgrdf--0.2.0.sql` | `${PKG}/usr/share/postgresql/${pg}/extension/*.sql` (L48) | exact |
`PKG=target/release/pgrdf-pg${pg}` at release.yml L43; the inner
`usr/{lib,share}` prefix is what cargo pgrx package writes by
default and matches what release.yml then `cp`s out of.
**Artifact set produced (aarch64, glibc-bookworm, pg17):**
| File | Size | Notes |
|---|---|---|
| `lib/pgrdf.so` | 2,214,040 B | ELF 64 LSB aarch64, BuildID `80021f2cβ¦`, not stripped |
| `share/extension/pgrdf.control` | 216 B | `default_version = '0.2.0'`, `module_pathname = '$libdir/pgrdf'` |
| `share/extension/pgrdf--0.2.0.sql` | 7,220 B | 234 lines, auto-generated by pgrx |
`pgrdf.so` SHA-256 `c8c661eaβ¦ab90c`. Sizes will differ slightly on
amd64 vs aarch64 and across pg14/15/16/17 (different pgrx-generated
FFI shim sets), but the **path layout is invariant** β that's all
the release workflow's repack step needs.
**Build time (fresh `cargo pgrx package`, single PG major, single
arch, all cargo deps cached in BuildKit mount):**
- `--no-cache` builder rebuild: 175.30s real (~3 min)
- of which `cargo pgrx package` proper: ~80s (deps mostly
warm-cached in `/usr/local/cargo/registry` mount, fresh
`pgrdf` compile + SQL extraction)
- subsequent `just build-ext` (all cached, just re-runs export
container): 1.46s
The release workflow runs each `(pg, arch)` cell on a clean GitHub
runner with no cargo cache, so 8 cells x ~10-15min cold-cache build
= ~80-120 min total wall time. Within tolerance.
**Warnings worth flagging:** none. `cargo pgrx package` exits 0,
both `pgrdf-builder-rust:pg17` (~3.35 GB) and `pgrdf-builder:pg17`
(~99 MB) images build clean, export container copies the three
artifacts into `compose/extensions/` with no errors.
**Verdict:** layout matches. Release workflow's repack step
(release.yml lines 46-48) will find every path it expects. No
follow-up needed for slice #26; downstream slice #25 (manual
tarball repack dry-run) will exercise the LICENSE + NOTICE +
SHA256SUMS aggregation end-to-end.
### Release pre-flight β SHA256SUMS verify + GPG signing defer (slice #27)
Follow-up to slice #28's `release.yml` audit. Slice #28 confirmed
SHA256SUMS coverage is already wired at **both** levels (per-tarball
internal manifest + aggregate top-level over all 8 tarballs). This
slice surfaces the orthogonal piece β the detached GPG signature
`SHA256SUMS.asc` mentioned in INSTALL Β§3 / LLD Β§5.4 step 3 β and
decides scope for it.
**SHA256SUMS state (confirmed):** the `Repack to INSTALL-spec layout`
step in the `build` job emits per-tarball internal `SHA256SUMS`
(line 51 of `release.yml`) covering `lib/pgrdf.so`,
`share/extension/*`, `LICENSE`, `NOTICE`. The downstream `release`
job's `Generate aggregate SHA256SUMS` step emits a top-level
`SHA256SUMS` covering every `pgrdf-*.tar.gz` and attaches it as a
release asset (lines 67-77). No release.yml change needed.
**GPG signing decision: defer to v0.4.** Rationale:
- No `GPG_PRIVATE_KEY` secret or release-signing key is provisioned
for the workflow today β `grep -rn "GPG_PRIVATE_KEY\|secrets\."
.github/` returns zero matches.
- No public-half signing key is published anywhere visible
(keyserver, release page, repo).
- SHA256SUMS itself is the primary integrity check most extension
consumers verify (`sha256sum -c SHA256SUMS`); the `.asc` signature
layer is a downstream supply-chain hardening, not a v0.3-cut
blocker.
- Wiring `.asc` properly requires (a) sourcing a real signing key
(Peter Styk maintainer key β not in repo), (b) publishing the
public half on a keyserver or release page, (c) adding the GitHub
secret + a `gpg --detach-sign` step to the workflow's `release`
job. All out of scope for a verification-and-defer slice.
**Docs edits applied:**
- `docs/09-release.md` "Aggregate checksums" section: rewrote to
confirm SHA256SUMS is wired at both levels (per-tarball +
aggregate) and to flag `.asc` GPG signing as v0.4 follow-up
(previously said "not yet wired in `release.yml`" which conflated
SHA256SUMS itself with the `.asc` signing). Added a new
"Verification (consumer side)" subsection showing the `curl` β
`sha256sum -c SHA256SUMS --ignore-missing` recipe plus the
in-tarball verification path; closes with a one-liner pointing
at what changes when `.asc` lands in v0.4
(`gpg --verify SHA256SUMS.asc SHA256SUMS`).
- `docs/10-roadmap.md` Phase 6 step 3 bullet: split the single
conflated bullet into a positive confirmation (SHA256SUMS wired
per slice #28) plus an explicit v0.4 defer for `.asc` listing
the three prerequisites (signing key, public-half publication,
secret wiring).
**No `.github/workflows/release.yml` change.** This is a
verify-and-document slice; the workflow already does what slice
#27's original plan would have done. The actual `.asc` wiring lands
in a v0.4 slice once a signing key is sourced.
Test bar unchanged: still 93 pgrx + 39 pg_regress + 23 W3C-shape +
3 LUBM-shape = 158 across all five layers.
### Release pre-flight β release.yml audit + NOTICE inclusion fix (slice #28)
End-to-end audit of `.github/workflows/release.yml` ahead of v0.3 cut.
Verified workflow shape:
- Trigger `on: push: tags: ["v*"]`, matrix `pg14/15/16/17 Γ {amd64, arm64}`
(8 tarballs), GH Release job gated on `needs: build`.
- Action pins: `actions/checkout@v4`, `dtolnay/rust-toolchain@stable`,
`actions/upload-artifact@v4`, `actions/download-artifact@v4`,
`softprops/action-gh-release@v2` (major-pin policy preserved).
- Auth: relies on default `GITHUB_TOKEN` with top-level
`permissions: contents: write`. No third-party secrets referenced.
- Pre-release detection: implicit via `softprops/action-gh-release@v2`'s
SemVer pre-release tag heuristic (e.g. `v1.0.0-rc1`); no explicit
`prerelease:` flag β relying on action default.
- SHA256SUMS: already wired in **both** per-tarball form (inside each
`pgrdf-<ver>-pg<PG>-glibc-<arch>.tar.gz`) and aggregate form (top-level
`SHA256SUMS` over all 8 tarballs, attached to the GH Release).
Supersedes the slice #36 audit note that flagged this as "not yet
wired"; no TODO needed.
**Bug fixed (Apache 2.0 Β§4(d) compliance):** the `Repack to INSTALL-spec
layout` step previously copied only `LICENSE` into the staging directory.
Apache 2.0 Β§4(d) requires that where a `NOTICE` file exists, its
attribution notices MUST be included in distributed derivative works.
Added `cp NOTICE "${OUT}/"` directly after the existing `cp LICENSE`
line, mirroring the LICENSE pattern exactly. Also updated the layout
comment block to list `NOTICE` between `LICENSE` and `SHA256SUMS`.
Net effect: each of the 8 published tarballs now ships `LICENSE` +
`NOTICE` + `SHA256SUMS` alongside the extension binaries, satisfying
the upstream license terms inherited from `oxigraph`, `spargebra`,
`sophia`, and other Apache-2.0 dependencies whose attribution flows
through pgRDF's own `NOTICE`.
### Roadmap β v0.4 scope cohesion check (slice #29)
Bi-directional cohesion audit between `specs/SPEC.pgRDF.LLD.v0.4-FUTURE.md`
(the source of truth for v0.4 scope) and the
`## v0.4 β next milestone (forward-looking)` section in
`docs/10-roadmap.md` (added by slice #31). The LLD wins on disagreement;
this slice fixes drift in the roadmap.
Coverage table (LLD Β§2 + ancillary v0.4 items β roadmap section):
| LLD item | Roadmap before | Roadmap after | Match? |
|---|---|---|---|
| Β§3 named-graph + IRI mapping | Track 1 | Track 1 | β
already |
| Β§4 SPARQL UPDATE | Track 2 | Track 2 | β
already |
| Β§5 graph-level lifecycle UDFs | Track 3 | Track 3 | β
already |
| Β§6 CONSTRUCT | Track 4 | Track 4 | β
already |
| Β§7 property paths | Track 5 | Track 5 | β
already |
| Β§11 SPARQL surface backlog | "Carried backlog" | "Carried backlog" | β
already |
| Β§12 perf work (heap_multi_insert + scans) | absent | NEW: "Performance work carried forward" | β
added |
| Β§13 W3C SPARQL 1.1 manifest runner wired v0.4 | absent | NEW: "Conformance runner wiring (v0.4)" | β
added |
| Β§8 reasoning profile selector β v0.5 | "Excluded from v0.4" | "Excluded from v0.4" | β
already |
| Β§9 real SHACL β v0.5 (E-009) | "Excluded from v0.4" | "Excluded from v0.4" | β
already |
| Β§10 TriG / N-Quads β v0.5 | "Excluded from v0.4" | "Excluded from v0.4" | β
already |
Reverse direction (every roadmap v0.4 subsection β LLD anchor) β every
existing subsection (Track 1-5, Carried backlog, Excluded) maps cleanly
to a numbered LLD section. No orphans in the roadmap.
Drift entries fixed:
- **Missing-in-roadmap: LLD Β§12 (Performance work carried forward)** β
the LLD explicitly says "v0.4 targets shipping this" for Phase 3 step
3 phase B (`heap_multi_insert` / `COPY BINARY`) and "v0.4 is the
earliest target" for Postgres custom-scan hooks. Both were absent from
the roadmap's v0.4 milestone section even though the roadmap's
pre-existing Phase 3 narrative already refers to phase B as v0.4 work.
Added a "Performance work carried forward from v0.3" subsection
pointing at v0.4-FUTURE Β§12.
- **Missing-in-roadmap: LLD Β§13 (W3C SPARQL 1.1 manifest runner wired
in v0.4)** β the LLD Β§13 test-policy paragraph says the manifest
runner "is wired in v0.4 β it gates Β§11's SPARQL backlog automatically
as the deferred forms come online". The roadmap's v0.4 milestone
section had no entry for this; the Phase 6 narrative covers the v0.3
state but the v0.4 wiring was unsurfaced in the forward-look. Added
a "Conformance runner wiring (v0.4)" subsection.
Framing checks (LLD wording β roadmap wording):
- LLD Β§2: "v0.4 ships five major tracks" β roadmap: "five major tracks
β the full contract lives in the spec". β
consistent.
- LLD Β§11: "ship together for economy" (translator machinery shared
with Β§4 + Β§6) β roadmap: "Shipped in the same cut because they share
the translator machinery Β§4 + Β§6 already require". β
consistent.
- LLD Β§8/Β§9/Β§10: framed as v0.5 work, "v0.4 keeps the v0.3 surface
unchanged" / "v0.4 does not attempt" / "v0.4 does not ship this; v0.5
does" β roadmap: "Excluded from v0.4 (planned v0.5)". β
consistent.
Total fixes applied: 2 new subsections added to
`docs/10-roadmap.md` v0.4 milestone section.
Test bar unchanged: docs-only slice. The LLD was not edited β only the
navigation aid.
### Roadmap β coverage ratchet table (slice #30)
Added a `## Coverage ratchet β release-by-release targets` section to
`docs/10-roadmap.md`, placed between the new
`## v0.4 β next milestone (forward-looking)` H2 (slice #31) and the
pre-existing `## Out of scope (v0.x)` H2 so the reader's eye flows
shipped-phases β next-milestone β ratchet-trajectory β
out-of-scope.
The new section consolidates targets already declared in scattered
prose across `specs/SPEC.pgRDF.LLD.v0.3.md` Β§5.4 + Β§6.1,
`specs/SPEC.pgRDF.LLD.v0.4-FUTURE.md` Β§13, and `docs/08-testing.md`
("What we don't test (yet)") into a single 7-row Γ 5-column table:
- Rows: pgrx integration, pg_regress golden, W3C-shape SPARQL
harness, LUBM-shape correctness harness, W3C SPARQL 1.1
conformance manifest, W3C SHACL conformance manifest, LUBM
cross-engine benchmark.
- Columns: v0.3 (current) shipped baselines, v0.4 target, v0.5
target, v1.0 target.
Every cell anchors to a documented source β none are fabricated.
Cells without a published target carry `TBD` rather than a guess
(the pgrx and pg_regress columns for v0.5 / v1.0 are TBD because
the v0.5 / v1.0 LLDs aren't drafted yet; `v0.4-FUTURE` Β§13 only
gives counts for v0.4).
A one-paragraph explainer below the table pins the ratchet
enforcement rule: each release's CI must hit at least that
release's column, once a target is met it becomes a floor and can
never regress, citing `docs/08-testing.md`'s "Coverage gates
ratchet but never lower" line.
Sub-edits:
- `docs/10-roadmap.md` β new H2 + table + enforcement paragraph,
inserted between `## v0.4 β next milestone` and
`## Out of scope (v0.x)`. Caption cross-links
`specs/SPEC.pgRDF.LLD.v0.3.md` Β§6.1, `v0.4-FUTURE` Β§13, and
`docs/08-testing.md`.
Test bar unchanged: no new pg_regress or pgrx fixtures, this is a
docs-only slice.
Source-citation per cell (so future contributors can verify before
ratchet'ing a column):
- pgrx v0.3 = `93` from `docs/08-testing.md` line 25 +
`docs/10-roadmap.md` v0.3 cut row.
- pgrx v0.4 = `+ heap_multi_insert tests` from
`docs/08-testing.md` line 25.
- pg_regress v0.3 = `39` from `docs/08-testing.md` line 26 +
v0.3 cut row.
- pg_regress v0.4 = `~60` from `v0.4-FUTURE` Β§13 breakdown
(Β§3 6-8 + Β§4 8-10 + Β§5 4 + Β§6 3-4 + Β§7 5-6 + Β§11 5-6).
- W3C-shape harness v0.3 = `23` from `docs/08-testing.md` line 27.
- W3C-shape harness v0.4+ = "superseded by TTL-manifest runner"
from `docs/08-testing.md` line 27.
- LUBM-shape v0.3 = `3` from `docs/08-testing.md` line 28.
- LUBM-shape v0.4+ = "superseded by LUBM-1/10/100 real benchmarks"
from `docs/08-testing.md` line 28.
- SPARQL conformance v0.3 = `not wired` from
`docs/08-testing.md` line 30 + `LLD v0.3` Β§5.4.
- SPARQL conformance v0.4 = `runner wired + β₯ 30 %` from
`LLD v0.3` Β§5.4 line 389 + `docs/08-testing.md` line 30, 182 +
`v0.4-FUTURE` Β§13 (runner wired in v0.4).
- SPARQL conformance v0.5 = `β₯ 70 %` from `LLD v0.3` Β§6.1
Phase 4 column line 415.
- SPARQL conformance v1.0 = `β₯ 95 %` from `LLD v0.3` Β§5.4
line 389 + Β§6.1 Phase 6 column line 415.
- SHACL conformance v0.3 = `not wired (E-009)` from
`LLD v0.3` Β§5.4 line 392-394 + `docs/08-testing.md` line 31.
- SHACL conformance v0.4 = `not wired (still E-009)` from
`v0.4-FUTURE` Β§9 (E-009 still gates SHACL real integration to
v0.5).
- SHACL conformance v0.5 = `β₯ 50 %` from `LLD v0.3` Β§6.1
Phase 4 column line 415 + `docs/08-testing.md` line 184.
- SHACL conformance v1.0 = `β₯ 90 %` from `LLD v0.3` Β§5.4
line 392 + Β§6.1 Phase 6 column line 415.
- LUBM benchmark v0.3 = `scaffold only` from
`docs/08-testing.md` line 32.
- LUBM benchmark v0.4 = `LUBM-1 smoke` from `LLD v0.3` Β§6.1
Phase 3 column line 416.
- LUBM benchmark v0.5 = `LUBM-10 baseline vs Apache Jena TDB /
Apache AGE` from `LLD v0.3` Β§6.1 Phase 4 column line 416 +
Β§5.4 line 395 + `docs/08-testing.md` line 32.
- LUBM benchmark v1.0 = `LUBM-100 vs Apache Jena TDB /
Apache AGE` from `LLD v0.3` Β§6.1 Phase 6 column line 416 +
`docs/10-roadmap.md` Phase 6 step 3 line 301.
### Roadmap β v0.4 milestone section (slice #31)
Added an explicit `## v0.4 β next milestone (forward-looking)` section
to `docs/10-roadmap.md`, placed between Phase 6 and "Out of scope" so
the reader's eye flows shipped-phases β next-milestone β
out-of-scope. The new section surfaces the five v0.4 tracks at H3
heading granularity (named-graph + IRI mapping, SPARQL UPDATE,
graph-level lifecycle UDFs, CONSTRUCT, property paths) plus the
carried SPARQL-surface backlog from v0.3 and the explicit "excluded
from v0.4 (planned v0.5)" list, with each H3 cross-linking the
specific anchor in `specs/SPEC.pgRDF.LLD.v0.4-FUTURE.md`.
The intent is navigation, not new contract material: each H3 is 2β4
lines pointing at v0.4-FUTURE for detail β the v0.4-FUTURE spec
remains the single source of truth, this section is a section-TOC
entry so readers can land on "what comes next" without spelunking
the Phase 1β6 bullets.
Sub-edits:
- `docs/10-roadmap.md` β new H2 + 7 H3s (5 tracks + carried backlog +
excluded-from-v0.4), inserted at line ~321 just above the existing
`## Out of scope (v0.x)` H2.
- `docs/10-roadmap.md` β "Test bar over time" preamble gains a
one-paragraph forward note that future v0.4 rows land under
`v0.4 cut` labels per the new section's track grouping; existing
v0.3 rows remain frozen as the shipped baseline.
Test bar unchanged: no new pg_regress or pgrx fixtures, this is a
docs-only slice.
Anchor verification: each cross-link uses the GitHub heading-slug
rules (lowercase, spaces β `-`, drop non-alphanumeric-or-hyphen,
em-dash β empty leaving a double-hyphen at its position). Targets
are Β§3 (`#3-named-graph-scoping-and-iri-mapping-new`), Β§4
(`#4-sparql-update-new`), Β§5 (`#5-graph-level-lifecycle-udfs-new`),
Β§6 (`#6-construct-deferred-from-v03-now-in-scope`), Β§7
(`#7-property-paths-deferred-from-v03-now-in-scope`), Β§11
(`#11-sparql-surface-backlog-deferred-from-v03-now-in-scope`), Β§8
(`#8-reasoning-profile-selector-v05--flagged-here-for-planning`),
Β§9 (`#9-shacl-real-integration-v05--gated-on-errata-e-009`), Β§10
(`#10-trig--n-quads-ingest-v05`).
### Docs β markdown link verification (slice #32)
Final docs-group lockdown pass before release-pre-flight: a full,
mechanical sweep of every internal markdown link across the repo to
confirm zero broken targets going into v0.3 release prep.
Scope and method:
| Surface | Count | Result |
| --- | --- | --- |
| Markdown files scanned (excl. `target/`, `node_modules/`, `fixtures/ontologies/`) | 61 | inventory complete |
| Total markdown links extracted (incl. external) | 153 | parsed via `\[β¦\]\(β¦\)` with code-fence stripping |
| External links (`http`/`https`/`mailto`/etc.) | 18 | not verified (out of scope) |
| Internal relative links (the audit surface) | 135 | every target resolved on disk |
| Same-file `#anchor` links | 1 | resolves to a real H2 in `docs/10-roadmap.md` |
| Cross-file `path.md#anchor` links | 0 | none in the repo |
| Directory-style links (e.g. `guide/`, `docs/`) | 4 | every target is a real directory |
| Non-markdown internal targets (`LICENSE`, `NOTICE`, `.sql`, `.rs`, `.tsv`) | 14 distinct | every file exists on disk |
Audit table β broken links found:
| File:line | Bad target | Type of break | Fix applied |
| --- | --- | --- | --- |
| _(none)_ | _(none)_ | _(none)_ | _(none)_ |
Counts: **broken 0 / fixed 0 / left-as-flagged 0**.
Verification approach (recorded for future slices):
- Resolver walks every `[text](path)` link, splits off `#anchor` and
`?query`, resolves `path` relative to the source file, then
`path.exists()` against the filesystem.
- For `#anchor` suffixes on `.md` targets, the anchor is matched
against GitHub's heading-slug rules: strip inline markdown
(`**`, `*`, `\``), lowercase, spaces β hyphens, drop any char that
isn't `[a-z0-9-_]`. Em-dashes and emoji collapse to nothing
(yielding double-hyphen runs) which is exactly how GitHub renders
them.
- Code fences (`\`\`\``) and inline-code spans (`\`β¦\``) are stripped
before link extraction to avoid false-positives on documented
example links.
- Excluded paths: `target/`, `node_modules/`, `.git/`,
`fixtures/ontologies/` (per task spec β the W3C ontology fixtures
carry their own internal links unrelated to repo docs).
Surprising findings: none. The repo is already link-clean. This
reflects the cumulative discipline of slices #33β#66 (each prior
slice has fixed its own anchor / path drift in-flight rather than
deferring), so the final lockdown pass finds nothing to fix. If a
permanent CI gate is wanted for v0.3 GA, the audit logic (61 files β
153 links β 0 broken in under a second) is small enough to land as
a `just check-links` recipe in a follow-up slice.
### Docs β guide intro + install audit (slice #44)
Companion pass to slice #45: walked the three user-guide entry-point files
(`guide/README.md`, `guide/00-intro.md`, `guide/01-install.md`) end-to-end
against current shipped reality β same discipline as the README audit, no
restructuring, only drift correction.
Audit scope and results:
| File | Surface | Result |
| --- | --- | --- |
| `guide/README.md` | All internal + external links, page-row blurbs, client-page targets, GitHub-issues URL | clean β every link resolves; the four pages and four client guides exist on disk |
| `guide/00-intro.md` | Status block ("Alpha, v0.3 engine feature-complete"), SPARQL feature row vs LLD Β§3 capability matrix, deferred-to-v0.4 list vs LLD Β§3 β³ entries, ERRATA E-009 link, naming + conventions claims | one fix β RDF-star out-of-scope citation pointed at "SPEC.pgRDF.LLD Β§2" but neither LLD v0.2 Β§2 ("High-Level Architecture") nor LLD v0.3 Β§2 ("What's shipped") addresses RDF-star / quoted triples. The citation is unfounded; re-pointed to ERRATA E-009 (the actual upstream feature-unification block on RDF 1.2 triple-term support, same root cause that gates the real SHACL impl) |
| `guide/00-intro.md` | SPARQL feature line: SELECT/ASK with BGP + FILTER + DISTINCT/LIMIT/OFFSET/ORDER BY + OPTIONAL + UNION + MINUS + aggregates (COUNT, SUM, AVG, type-aware MIN/MAX, GROUP_CONCAT, SAMPLE) + HAVING (alias **and** inline aggregate) + BIND | clean β matches LLD Β§3 verbatim and the README status pill |
| `guide/00-intro.md` | Code-block UDF signatures (`load_turtle`, `count_quads`, `sparql`, `materialize`, `add_graph`) | clean β every example matches `src/storage/`, `src/inference/`, `src/query/` current arity |
| `guide/00-intro.md` | "What's NOT" list (federated SPARQL, full OWL 2 reasoner, RDF-star, replacement for graph DB) | clean |
| `guide/01-install.md` | Path A compose flow (`build-ext`, `compose-up`, `psql`, `CREATE EXTENSION`, `pgrdf.version() β 0.2.0`) | clean β every recipe is in the `Justfile`; `compose/.env.example` exists; `pgrdf.version()` returns `env!("CARGO_PKG_VERSION") = "0.2.0"` |
| `guide/01-install.md` | PG-version range claim (`postgres:14..postgres:17`) | clean β matches ERRATA E-006 hold (pgrx 0.16.1, PG 14-17) for v0.3; PG 18 deferred to v0.4 |
| `guide/01-install.md` | Path B Kubernetes ref to `specs/SPEC.pgRDF.INSTALL.v0.2.md` | clean β spec is unchanged in v0.3 per LLD Β§0 |
| `guide/01-install.md` | Path C manual-install URL (`releases/download/v0.2.0/pgrdf-0.2.0-pg17-glibc-amd64.tar.gz`) | acknowledged as illustrative β no `v0.2.0` GitHub release exists yet (no git tag in the repo), but the section header is "If you have a Postgres server you control" with a "Download the matching tarball" comment that reads as a worked example for the post-release case. INSTALL spec Β§3 uses the same placeholder pattern with `0.4.1`. Left as-is per the conservative rule. |
| `guide/01-install.md` | Verify-install snippet (`SHOW shared_preload_libraries`, `pgrdf.stats() -> 'shmem_ready'`) | clean β `shmem_ready` is the documented field in `src/storage/stats.rs:45` |
| All three files | "Phase 3 β Extended SPARQL surface" stale-label check (slice #45 adjacent finding) | not present β the guide files don't carry the conflicting roadmap label |
| All three files | `just test-all` / new recipe coverage | not referenced β the guide intentionally points users at `compose-up` + `psql`, not the test harness |
**Result: one citation correction.** The Β§2 reference for the RDF-star
out-of-scope policy was unfounded β the policy stands as a project
decision rooted in ERRATA E-009 (`oxrdf` `rdf-12` feature surface
conflicting with `reasonable 0.4.1`), not in any LLD section. Re-pointed
the citation; no other text moved.
Drift sources surveyed for completeness:
- v0.3 vs v0.4 target labels β every deferred bullet in the guide files
already says `β³ v0.4` correctly.
- "Alpha"/"unstable"/"experimental" framing β the guide says
"Alpha, v0.3 engine feature-complete" which is the same status the
README pill carries. No "experimental" / "unstable" wording leaked in.
- Test counts (93/39/23/3 = 158) β guide files don't cite specific
counts (those live in the README + `docs/08-testing.md`); no drift
surface here.
This completes the three-file user-guide entry-point sweep.
### Docs β README audit (slice #45)
Final pre-release pass over `README.md` β every badge, every link, every
code block, every test-count claim, every status pill walked against
current shipped reality.
Audit scope and results:
| Surface | Check | Result |
| --- | --- | --- |
| Top-of-file badges (12) | URL resolves, value matches reality | clean |
| Status row pill | "v0.3 engine surface feature-complete", SPARQL feature list, Phase 3/4/5/6 labels, PG version list, deferred-to-v0.4 list | clean β Phase numbering matches `specs/SPEC.pgRDF.LLD.v0.3.md` Β§5; SPARQL feature list matches `docs/10-roadmap.md` Β§Phase 3 steps 1β12 plus the brought-forward HAVING-inline-aggregate and type-aware MIN/MAX |
| Local-file link targets (~25) | Each path exists on disk | clean β every `LICENSE`, `NOTICE`, `docs/*`, `guide/*`, `specs/*`, `tests/perf/*`, `tests/w3c-sparql/`, `TEST.ONTOLOGY-SET.md` link resolves |
| Code-block UDF signatures (`load_turtle`, `load_turtle_verbose`, `parse_turtle`, `add_graph`, `count_quads`, `materialize`, `sparql`, `sparql_parse`, `version`) | Signature in README matches `src/` | clean β every example matches current arity and return type |
| `just` recipes referenced (`build-ext`, `compose-up`, `psql`, `test`, `test-regression`, `test-w3c`, `test-lubm`, `test-all`, `test-conformance`, `test-everything`, `smoke-cold`) | Recipe exists in `Justfile` | clean |
| Test counts (93 pgrx + 39 pg_regress + 23 W3C-shape + 3 LUBM-shape = 158) | Match disk reality | clean β pgrx `#[pg_test]` grep returns 93; `tests/regression/sql/*.sql` count is 39; `tests/w3c-sparql/` has 23 test dirs; `tests/perf/lubm-shape/` has 3 query dirs |
| Smoke claim (24 ontologies, 17,134 triples) | Matches `tests/perf/smoke-ontologies.expected.tsv` | clean β 24 rows, sum of triple-count column is 17,134 |
| License section | Matches `LICENSE` + `NOTICE` (Copyright 2026 Peter Styk, Apache-2.0) | clean |
| ERRATA E-006 re-check date (2026-05-14) | Matches `specs/ERRATA.v0.2.md` | clean |
| `pgrdf.version()` return ("0.2.0") | Matches `Cargo.toml` `version` field | clean |
| CI W3C-shape + LUBM-shape wiring | Workflow runs `tests/w3c-sparql/run.sh` and `tests/perf/lubm-shape/run.sh` | clean β `.github/workflows/ci.yml` lines 119 + 128 |
**Result: zero drift.** No facts required correction; no links required
re-targeting; no signatures required updating. README is consistent with
the v0.3 LLD, the current Justfile, the current test fixtures, the
current Cargo.toml, and the current ERRATA. The audit re-establishes the
baseline before the v0.3 tag.
One adjacent finding β out of scope for this audit but noted for the
follow-on: `docs/10-roadmap.md` carries **two overlapping Phase 3
labels** (a `### Phase 3 β Extended SPARQL surface` heading at line 130
inside the `## Phase 2` section, AND a "Phase 3 storage performance" use
at the intro line 5 and the Β§270 test-bar-over-time table). Both are
internally consistent with the LLD Β§5 phase numbering (`Phase 3 =
Storage Performance`), but the in-line heading creates a local
ambiguity. Not corrected here β roadmap surgery is its own slice β and
the README correctly tracks the LLD scheme. Filed mentally for the
roadmap maintenance group.
### Hygiene β Cargo.lock freshness audit (slice #46)
Final entry in the hygiene group (54 β 46, sixty-six β forty-six). Verified
`Cargo.lock` is committed (`git ls-files Cargo.lock` returns the path) and
matches `Cargo.toml`: `cargo metadata --format-version 1` resolves clean on
the online index. **Reproducibility check** β captured the lock's MD5
(`1627cb986cfb73ca300550854b9564d5`), ran `cargo build --no-default-features
--features pg17` (via the rustup-managed `stable-aarch64-apple-darwin`
toolchain at rustc 1.95, since Homebrew's PATH-first rustc 1.88 is below
the declared MSRV); link step fails on the workstation as expected (pgrx
final link wants `pg_config` on PATH, not in scope here), but resolution
completes and the post-build MD5 is byte-for-byte identical. The lock is
stable β Cargo did not touch it during a fresh resolution pass.
**`cargo update --dry-run --verbose`** (online):
```
Updating crates.io index
Locking 1 package to latest compatible version
Unchanged pgrx v0.16.1 (available: v0.18.0)
Unchanged pgrx-tests v0.16.1 (available: v0.18.0)
Updating winnow v1.0.2 -> v1.0.3
warning: not updating lockfile due to dry run
```
| Crate | Bump | Classification | Root |
| --- | --- | --- | --- |
| `winnow` | 1.0.2 β 1.0.3 | Safe patch of a build-time transitive | `pgrx-pg-config β cargo_toml β toml 0.9.12+spec-1.1.0 β toml_parser 1.1.2 β winnow 1.0` |
| `pgrx` | 0.16.1 β 0.18.0 (held) | Pinned root (E-006) β not eligible under current `Cargo.toml` constraint `0.16` | direct |
| `pgrx-tests` | 0.16.1 β 0.18.0 (held) | Pinned root (E-006) β same | dev-dep |
The single eligible bump (`winnow` 1.0.2 β 1.0.3) is a patch on the
inner parser used by `cargo_toml` at build time β no runtime crate
touched, no `serde_json`/`serde_core`/`tokio`/-sys edge in scope, no
pinned root moves. Under a v0.4-cycle policy this would land
automatically alongside a `just test-regression` re-run.
**Decision: skip and defer to the v0.4 hygiene cycle.** Rationale:
the lock is reproducing cleanly, the only eligible bump is a single
transitive patch with zero behavioural surface, and the v0.3 tag is
imminent. Intentional churn against `Cargo.lock` this close to a
release tag adds risk (new regression-test pass required, new
artifact hash, new container layer) without proportionate benefit.
The held bumps on `pgrx` 0.16 β 0.18 are gated by ERRATA E-006 and
will move only when E-006 resolves; that's a v0.4 work item already
on the roadmap, and `winnow` will ride along on the same `cargo
update` invocation at that point.
This closes the hygiene group (slices #54 β #46, 9 entries). Lock
is fresh, reproducible, and audited; the next intentional refresh
is owed in the v0.4 cycle.
### Hygiene β lints allowlist review (slice #47)
Audited every `#![allow(...)]` / `#[allow(...)]` attribute in `src/`
plus the `[lints.rust]` block in `Cargo.toml`. Procedure: comment each
entry, run `cargo check --no-default-features --features pg17 --tests`
(rustc 1.95.0 via rustup-installed stable, since the Homebrew rustc on
this workstation is 1.88 β below the declared MSRV), then restore.
For each entry, classify the lint as still firing (keep), masking a
single site only (narrow candidate), or no longer firing (trim
candidate).
| Allow | Location | Scope | Lint still fires? | Disposition |
| --- | --- | --- | --- | --- |
| `unreachable_patterns` | `src/lib.rs:14` | crate | Yes β 6 sites (`reasonable.rs:247`, `loader.rs:161`, `executor.rs:1861`, `executor.rs:1902`, `parser.rs:161`, `parser.rs:212`) | Keep. Rationale ("future-proof against upstream `#[non_exhaustive]` variant additions") is crate-wide design intent. |
| `clippy::doc_lazy_continuation` | `src/lib.rs:19` | crate | Yes β 4 doc sites (`lib.rs:35`, `lib.rs:36`, `executor.rs:450`, `executor.rs:451`) | Keep. Rationale ("vertically-aligned ASCII continuation lines"). |
| `clippy::useless_conversion` | `src/lib.rs:25` | crate | Yes β 1 site (`executor.rs:156` `SetOfIterator::new(rows.into_iter())`) | Keep. Single-site narrowing would invert the rationale ("don't litter call sites with annotations"). |
| `unreachable_patterns` | `src/inference/reasonable.rs:246` | item | Redundant under crate-level allow above, but documents intent at the call site | Keep. |
| `unreachable_patterns` | `src/storage/loader.rs:160` | item | Same as above. | Keep. |
| `[lints.rust] unexpected_cfgs check-cfg = ["cfg(feature, values(\"pg13\", \"pg18\"))"]` | `Cargo.toml:53` | crate | Yes β 9 `pg13` + 9 `pg18` sites under rustc 1.95 (pgrx 0.16.1's `pg_shmem_init!` / per-PG `pg_guard` shims expand cfg branches for every PG major they know about regardless of which `feature = "pgN"` we select) | Keep. |
**Result:** zero trims. Every allow on disk currently suppresses a
real lint that fires under rustc 1.95. The two item-level
`unreachable_patterns` allows are redundant under the crate-level one
but document intent at the call site, so they stay. Cargo accepts
`[lints.rust]` as written (the IDE schema linter flags it as invalid
under its TOML schema; `cargo check` parses it without complaint).
This audit re-establishes the baseline: a future slice that drops or
narrows an allow can point back here as the prior-state record.
### Hygiene β ERRATA E-006 pgrx-upstream re-check (slice #48)
Refreshed `specs/ERRATA.v0.2.md` E-006 against today's upstream state.
`crates.io` reports `pgrx.max_stable_version = "0.18.0"` (unchanged
since 2026-04-17); `develop` is one commit ahead (PR #2280, an
aarch64 `-Wl,--no-gc-sections` link-flag fix). Upstream README now
documents "pgrx supports Postgres 13 through Postgres 18" β PG 18
support has officially landed at the 0.18.0 line. Local-compile
blockers from the 2026-05-13 saga are unchanged: 0.17.0's
`non_null_from_ref` E0658 and 0.18.0's `impl_table_iter` E0716 still
reproduce on every Rust stable/nightly we tested, and `develop` has
not touched the relevant macro since the release. Additionally,
0.18.0 carries a hard breaking migration (PR #2264 /
`v18.0-MIGRATION.md`): `pgrx_embed` binary removed, `crate-type` must
drop `"lib"`, manual `SqlTranslatable` impls move from methods to
associated `const`s. pgRDF still ships `src/bin/pgrx_embed.rs` and
`crate-type = ["cdylib", "lib"]`, so the bump is non-trivial.
**Disposition:** E-006 stays open. Classification B β partially
resolved at the upstream layer (PG 18 support exists), still blocked
at the consumption layer (E0716 + breaking-migration scope). Hold
pgrx 0.16.1 + PG 14β17 matrix for v0.3; defer pgrx-0.18 migration to
v0.4 as a planned work item. README + `docs/10-roadmap.md` updated
to reflect the partial-resolution framing. Next re-check trigger:
any pgrx publish above 0.18.0 OR an E0716 fix landing on `develop`.
### Hygiene β MSRV declared (slice #49)
Added `rust-version = "1.91"` to `[package]` in `Cargo.toml`. The value
matches the CI build container (`compose/builder.Containerfile` β
`FROM rust:1.91-bookworm`), which is the only Rust version pgRDF
artifacts are actually produced against. The existing `[lints.rust]`
block already assumes Rust 1.91+'s strict `check-cfg` behavior (see
the inline comment introduced in an earlier slice), so declaring a
lower MSRV would misadvertise support. pgrx 0.16.1's
`resolver = "3"` independently imposes a 1.84 floor; this declaration
tightens that to the value CI verifies. `rust-toolchain.toml` stays
on `channel = "stable"` β pinning a specific minor for an active
project trades health for false stability.
Verification: `cargo check --no-default-features --features pg17
--ignore-rust-version` clean on the dev workstation (rustc 1.88.0
Homebrew); the `--ignore-rust-version` is required only because the
workstation toolchain is older than the declared MSRV β CI's 1.91
container is unaffected. Bump the `rust-version` in lockstep with
the Containerfile when upgrading the build floor.
### Hygiene β cargo tree duplicate-version audit (slice #50)
Ran `cargo tree --duplicates --no-default-features --features pg17`
against `Cargo.lock`. Workspace currently resolves to **182 crates**
(normal + build edges); first-order direct deps are seven: `oxrdf`,
`oxttl`, `pgrx`, `reasonable`, `serde_json`, `spargebra` (normal)
plus `pgrx-tests` (dev). Nine crates appear at two distinct versions:
| Crate | Versions | Sources | Fix attempted |
| --- | --- | --- | --- |
| `byteorder` | 0.5.3 / 1.5.0 | `reasonable β roaring 0.5.2` (0.5) vs `pgrx-tests β tokio-postgres β postgres-protocol` (1.5) | No. `reasonable` is pinned (E-009); `roaring 0.5.2`'s old `byteorder 0.5` is structural until `reasonable` bumps. |
| `getrandom` | 0.3.4 / 0.4.2 | `oxrdf/oxttl/spargebra β rand 0.9 β rand_core 0.9 β getrandom 0.3` vs `pgrx β uuid + tempfile + rand 0.10 β getrandom 0.4` | No. Both roots pinned (oxrdf/oxttl/spargebra semantic-stability; pgrx 0.16 E-006). |
| `hashbrown` | 0.15.5 / 0.17.1 | `pgrx-sql-entity-graph β petgraph 0.8.3 β hashbrown 0.15` AND same `petgraph 0.8.3 β indexmap 2.14 β hashbrown 0.17`. `petgraph 0.8.3` itself pulls two hashbrowns. | No. Internal to `petgraph 0.8.3`; not fixable downstream. |
| `itertools` | 0.8.2 / 0.13.0 | `reasonable` (0.8) vs `pgrx-bindgen β bindgen 0.71.1` (0.13) | No. Both roots pinned. |
| `rand` | 0.9.4 / 0.10.1 | `oxrdf/oxttl/spargebra` (0.9) vs `pgrx-tests β tokio-postgres β postgres-protocol` + `pgrx β uuid + tempfile` (0.10) | No. Both roots pinned. |
| `rand_core` | 0.9.5 / 0.10.1 | Follows the `rand` split. | No. Same as `rand`. |
| `thiserror` | 1.0.69 / 2.0.18 | `reasonable` + `cargo_metadata 0.18.1 (via clap-cargo via pgrx-tests)` (1.x) vs `oxrdf/oxttl/spargebra/pgrx/pgrx-pg-config/pgrx-sql-entity-graph/pgrx-tests` (2.x) | No. `thiserror` 1β2 is an intentional major; reasonable+cargo_metadata cannot move to 2 without their own bumps. |
| `thiserror-impl` | 1.0.69 / 2.0.18 | Mirrors `thiserror`. | No. Same as `thiserror`. |
| `winnow` | 0.7.15 / 1.0.2 | `pgrx-pg-config β cargo_toml β toml 0.9.12+spec-1.1.0` β same `toml` crate uses `winnow 0.7` (top-level parser) AND `winnow 1.0` (via the inner `toml_parser` 1.1.2 helper crate). | No. Internal to `toml 0.9.12`; not fixable downstream. |
Plus six crates that `cargo tree --duplicates` flags but Cargo.lock
shows at exactly one version (`bitflags 2.11.1`, `memchr 2.8.0`,
`peg-runtime 0.8.6`, `percent-encoding 2.3.2`, `serde_core 1.0.228`,
`serde_json 1.0.149`) β these are single-version crates pulled in
through multiple distinct dep chains, which the `--duplicates` view
also surfaces. Nothing to fix on those.
**Disposition:** zero code or `Cargo.lock` changes. Every actual
duplicate roots in a pinned dep (`reasonable` 0.4.1 / `pgrx` 0.16 /
`oxttl`+`oxrdf`+`spargebra` semantic-stability) or in a transitive
internal split (`petgraph` 0.8.3, `toml` 0.9.12). No SemVer-safe
`cargo update --precise` collapse exists today. The duplicate budget
is in line with what a Rust workspace of this composition produces
and is purely informational β recorded so a future audit can diff
against the same picture once `reasonable` and `pgrx 0.16` are
unpinned (E-006, E-009).
### Hygiene β cargo audit (slice #51)
Ran `cargo audit` (v0.22.1, advisory-db 1088 advisories loaded
2026-05-14) against `Cargo.lock` (287 crate deps). **Zero security
vulnerabilities.** Four informational warnings, all in pinned-dep
subtrees (`pgrx 0.16` / `reasonable 0.4.1`) β none has a
SemVer-compatible fix without violating the pinned-core-dep
constraint (E-006 / pgrx 0.16, `reasonable` 0.4.1 RDF 1.2 saga in
E-009). Deferred to the cuts that bump those upstreams.
| ID | Kind | Crate | Source | Disposition |
| --- | --- | --- | --- | --- |
| RUSTSEC-2024-0375 | unmaintained | `atty 0.2.14` | `reasonable 0.4.1 β env_logger 0.7.1 β atty` | Defer. Fix requires `reasonable` to bump `env_logger` past 0.7; `reasonable` is pinned (see ERRATA E-009). |
| RUSTSEC-2021-0145 | unsound | `atty 0.2.14` | same path as above | Defer. Same root cause; unaligned-read CVE in `atty`'s Windows path. Unreachable on the Linux/macOS targets pgRDF builds for, but the advisory still trips on `Cargo.lock`. |
| RUSTSEC-2024-0436 | unmaintained | `paste 1.0.15` | `pgrx-tests 0.16.1 β paste` (dev-dep only) | Defer. Test-only proc-macro dep of pgrx-tests. Resolves when pgrx is unpinned (E-006). |
| RUSTSEC-2021-0127 | unmaintained | `serde_cbor 0.11.2` | `pgrx 0.16.1 β serde_cbor` | Defer. Hard transitive of pgrx 0.16. Resolves when pgrx is unpinned (E-006). |
Counts: Critical 0 / High 0 / Medium 0 / Low 0 / Yanked 0 /
Informational 4. No code or `Cargo.lock` changes β the advisories
are real but structurally unfixable in v0.3 without breaking the
pinned-dep contract. New ERRATA entry **E-010** records the
pinned-dep advisory ledger so future audits can diff cleanly.
### Hygiene β stale-docstring sweep
Audited every public `#[pg_extern]` and module-level docstring under
`src/` against actual signatures and behavior, looking for prose that
lied about current code (wrong return types, missing JSONB fields,
"Phase N backlog" claims for features that have since shipped, "still
unsupported" lists that the executor now handles). Eleven category-A
fixes landed across 7 files:
| file:line | drift | fix |
| --- | --- | --- |
| `src/storage/hexastore.rs:46` | `add_graph` SQL surface doc claimed `β VOID` | corrected to `β BOOLEAN`, documented return semantics (TRUE = created) |
| `src/storage/loader.rs:320-321` | `load_turtle_verbose` JSONB-field list missed `shmem_cache_hits` | added the field to the doc |
| `src/storage/stats.rs:17-29` | `stats()` JSON example showed only `shmem_*` keys; `plan_cache_*` keys (added in Phase 3 step 2) were missing | extended the example with all four `plan_cache_*` fields + explanatory note |
| `src/storage/mod.rs:3-5` | module doc said "Implementation status: skeleton" | replaced with a submodule index reflecting the v0.3 reality |
| `src/query/parser.rs:9-21` | "Today's scope" claimed OPTIONAL/UNION/non-BGP get flagged in `unsupported_algebra`; code walks through them | rewrote scope list to reflect the actual `unsupported_algebra` rejection set + added v0.4 cross-refs |
| `src/query/plan_cache.rs:103` | `plan_cache_clear` SQL surface doc said `β integer` | corrected to `β BIGINT` (matches `i64` return) |
| `src/query/executor.rs:14-17` | module doc claimed "dynamic SQL only carries integer constants" β incorrect post-Phase-3-step-2 (placeholders, not inlined constants) | rewrote to reflect `$N` positional parameter binding |
| `src/query/executor.rs:20-21` | "Scope today: SELECT only (no CONSTRUCT/ASK/DESCRIBE)" | ASK ships β fixed; CONSTRUCT/DESCRIBE marked with v0.4-FUTURE Β§6 pointer |
| `src/query/executor.rs:61-69` | scope said "HAVING and `GROUP_CONCAT` / `SAMPLE` are Phase 3 backlog" and "BIND remain unsupported" β all three landed | rewrote aggregates + BIND blocks to reflect implementation; v0.4-FUTURE pointers on the still-unsupported set |
| `src/query/executor.rs:498-499` | `parse_aggregate` doc listed only COUNT/SUM/AVG/MIN/MAX | added GROUP_CONCAT and SAMPLE; noted Custom IRI panic |
| `src/query/executor.rs:1391-1404` | `translate_filter` doc said numeric ordering, IN, REGEX were "not yet supported" β they are | rewrote both lists; left `EXISTS` + conditional `IF` as still-unsupported |
| `src/query/executor.rs:1535-1539` | `expr_to_id_sql` doc said constants β "inlined integer literal" | corrected to "`$N` parameter placeholder bound to the resolved dict id" |
| `src/query/executor.rs:2107-2109` | test docstring `sparql_unknown_predicate_returns_zero_rows` said translator "inlines `-1`" | corrected to "binds `-1` as the parameterised dict id sentinel" |
| `src/validation/shacl.rs:38-52` | `validate` JSONB schema doc listed `data_graph_exists` + `shapes_graph_exists` fields the body does not emit | removed the two fields from the doc (the stub doesn't emit existence flags β only counts) |
No behavior changes. `cargo check --features pg17` is green. Test bar
unchanged (39 pg_regress + 93 pgrx + 23 W3C + 3 LUBM = 158).
### Licensing β explicit attribution surface
`LICENSE` carries the resolved Apache 2.0 copyright notice
("Copyright 2026 Peter Styk <peter@styk.tv>" + project URL) in
place of the upstream `[yyyy] [name of copyright owner]`
placeholders. A new `NOTICE` file at the repo root carries the
Apache convention header β distributions that bundle pgRDF
should preserve it per Apache 2.0 Β§4(d). `Cargo.toml` gains an
`authors = ["Peter Styk <peter@styk.tv>"]` field and a
`homepage` mirror of the `repository` URL. `README.md`'s License
section is fleshed out to name the copyright holder and link
both `LICENSE` and `NOTICE`. No code or test changes.
### Spec β SPEC.pgRDF.LLD.v0.4-FUTURE draft landed (forward-looking)
New `specs/SPEC.pgRDF.LLD.v0.4-FUTURE.md` is a draft, forward-looking
target spec for the next cut; v0.3 remains the authoritative
shipped contract until v0.4 actually lands. The draft scopes five
new substantive tracks β named-graph scoping with an IRI β graph_id
mapping table (Β§3), SPARQL UPDATE including the graph-scoped
variants (Β§4), graph-level lifecycle UDFs over the LIST-partitioned
quads table (Β§5), CONSTRUCT returning triple-shaped JSONB rows
(Β§6), and property paths `*` / `+` / `?` / `^` with
materialised-closure-aware translation (Β§7) β plus the v0.3-deferred
SPARQL surface backlog (multi-triple OPTIONAL, VALUES,
BIND-downstream, aggregates over UNION, DESCRIBE) which shares
enough translator machinery with Β§4 and Β§6 to ship in the same cut.
v0.3 Β§0 gains a one-line cross-link to the new draft.
### Coverage β error-path regression signals
New `tests/regression/sql/81-error-paths.sql` opens a sibling track
to `80-unsupported-shapes.sql`: instead of locking the failure-mode
of SPARQL translator gaps, it locks the stable error-prefix each
UDF emits when given an invalid input. The helper `_check_error`
generalises `80`'s `_check_gap` to run arbitrary SQL via `EXECUTE`
inside a plpgsql try/catch, capturing the boolean signal (`t` =
expected substring present in SQLERRM) without pinning the
volatile tail (OS-level `os error N` numbers, path strings, etc.).
This commit locks #66 of the 66 β 1 countdown toward v0.3.0:
`pgrdf.load_turtle()` against a missing path must surface the
prefix `load_turtle: failed to open` (from
`src/storage/loader.rs:315`). Downstream tooling matches that
prefix to decide retry-vs-escalate; a silent rename would break
those callers without any pgRDF-side test firing.
Locks #65 of the countdown: a syntactically invalid `base_iri`
argument must surface the prefix `load_turtle: invalid base IRI`
(from `src/storage/loader.rs::ingest_turtle_with_stats`'s
`with_base_iri().unwrap_or_else(...)`). The check fires through
`pgrdf.parse_turtle('...', 9982, 'not an iri at all')` β using
`parse_turtle` keeps the regression file fixture-free while
exercising the same shared ingest path. The panic message is
prefixed `load_turtle:` even when triggered via `parse_turtle`;
that cross-UDF prefix invariance is itself part of the contract
(downstream callers route on one substring regardless of which
UDF parses the Turtle). Empty-string `base_iri` continues to
short-circuit before `with_base_iri()` runs, so callers can
safely pass `''` to mean "no base"; only a non-empty value that
fails oxiri's IRI grammar trips the prefix.
Locks #64 of the countdown: syntactically malformed Turtle bytes
must surface the prefix `load_turtle: turtle parse error` (from
`src/storage/loader.rs:256`'s `triple_result.expect(...)` inside
the parser-iterator loop). The check fires through
`pgrdf.parse_turtle(':alice :name "Alice"', 9964)` β the
fragment uses the default `:` prefix without declaring it, so
oxttl rejects at byte 0 with `The prefix : has not been
declared`; that specific complaint is tail / volatile, the
locked substring is just the `load_turtle: turtle parse error`
prefix. The same cross-UDF prefix invariance as error-65
applies: the panic text says `load_turtle:` regardless of
whether bytes entered via `load_turtle()` or `parse_turtle()`,
so downstream tooling routes on one substring. Any malformed
Turtle variant (missing trailing dot, undeclared prefix, bad
IRI ref, RDF-star in default mode) trips the same prefix.
Locks #63 of the countdown: a syntactically malformed SPARQL
query handed to `pgrdf.sparql()` must surface the prefix
`sparql: parse error:` (from `src/query/executor.rs:142`'s
`SparqlParser::new().parse_query(sql).unwrap_or_else(...)`).
The check fires through
`SELECT * FROM pgrdf.sparql('this is not sparql at all')` β
spargebra rejects at byte 10 with `expected CONSTRUCT`; that
specific complaint plus the line:col coordinates are tail /
volatile across spargebra versions, the locked substring is
just the `sparql: parse error:` prefix. This is the user-facing
contract surface for query-parse failure, distinct from the
translator-gap prefix locked across `80-unsupported-shapes`
(`sparql: β¦not supported yet`, `sparql: aggregates on top of
UNIONβ¦`, etc.) and from the RDF-ingest prefixes locked in
error-66/65/64. The sibling introspection UDF
`pgrdf.sparql_parse()` routes through its own panic site with
prefix `sparql_parse:` instead β a deliberate distinction so
callers can tell which entry point the bytes came in through;
that path is covered by the `#[pg_test]`
`sparql_parse_syntax_error_panics` in `src/query/parser.rs`
and is not pinned by this regression slice.
Test bar: **93 pgrx + 34 pg_regress + 23 W3C-shape + 3 LUBM-shape
= 153 tests**, green locally.
### Coverage β edge-case correctness regression signals
New `tests/regression/sql/62-materialize-empty.sql` opens a sibling
track to the error-path file (`81-error-paths.sql`): instead of
locking the prefix a UDF emits when given an *invalid* input, it
locks the *correctness contract* on **edge-case but valid** inputs
the engine must handle without surprise. The countdown shifts from
66β63 (error-path locks) into 62βonward (edge-case locks).
Locks #55 β the final entry in the 66β1 coverage countdown β promotes
the W3C-shape and LUBM-shape harnesses to first-class Justfile
recipes and adds a cold-compose smoke that exercises every
compose-based test layer end-to-end. New recipes: `just test-w3c`
(wraps `bash tests/w3c-sparql/run.sh`), `just test-lubm` (wraps
`bash tests/perf/lubm-shape/run.sh`), `just test-conformance` (the
three compose-based layers: regression + W3C-shape + LUBM-shape),
`just test-everything` (pgrx integration + test-conformance β the
broadest sweep), and `just smoke-cold` (`compose-down` β
`build-ext` β `compose-up` β `CREATE EXTENSION` β test-conformance,
the cold-compose discipline gate). `just test-all` keeps its
original narrow shape (`test` + `test-regression`) for back-compat;
`docs/08-testing.md` and `README.md`'s Tests block point at
`test-everything` and `smoke-cold` as the new entry points. The
shift matters because two of the three compose-based harnesses
(W3C-shape, LUBM-shape) were previously discoverable only by
knowing the bash paths β a contributor running `just --list`
saw nothing about them, and `just test-all` silently skipped
them. Cold-compose smoke is the verification half: it tears the
compose stack down with `compose-down` first (no shortcuts to a
warm `compose-up`), rebuilds the extension artefacts, brings the
stack back, recreates `CREATE EXTENSION pgrdf`, and runs all
three compose-based layers against the fresh state β catching the
class of bugs that pass on a warm compose because some prior
DROP/CREATE left state behind, and break on the next cold boot.
This is the final coverage-countdown slice before the hygiene
phase opens.
Locks #62 of the 66β1 countdown: `pgrdf.materialize(N)` on a graph
with zero base triples MUST NOT panic and MUST return a JSONB stats
object with `base_triples = 0`. The UDF still emits OWL 2 RL
**axiomatic triples** (per `reasonable 0.4`, four self-statements
over `owl:Thing` / `rdfs:Class` / etc. on the empty input) β that
count is upstream-defined and NOT pinned by this slice; only
`inferred_triples_written β₯ 0` is part of the locked contract.
Idempotency carries across the empty case: a second
`materialize(N)` call wipes its own prior `is_inferred=TRUE` rows
before re-deriving, so run 2's `previous_inferred_dropped` equals
run 1's `inferred_triples_written` exactly. Both invariants
project as booleans (`base_is_zero`, `inferred_nonneg`,
`first_run_dropped_zero`, `idempotent`) so the expected output
stays `t` regardless of axiomatic-set churn from upstream
`reasonable` releases.
Locks #61 of the countdown: `pgrdf.shmem_reset()` MUST actually
invalidate the process-wide shmem dict cache. The implementation
in `src/storage/shmem_cache.rs::reset()` bumps a single
`PgAtomic<AtomicU64>` `GENERATION` counter; `lookup()` reads slots
as cold whenever `slot.generation != current`. A refactor that
drops the generation bump would silently leave stale dict ids
visible across a `DROP EXTENSION; CREATE EXTENSION` cycle (where
the dict id space resets), so the regression must catch the
omission. New `tests/regression/sql/63-shmem-reset-invalidation.sql`
warms shmem with three terms, snapshots `(shmem_hits,
shmem_inserts)` via `\gset`, re-parses the same Turtle and asserts
hits went up (sanity β cache is hot), then calls `shmem_reset()`,
re-parses one more time, and asserts (a) `shmem_hits` stayed flat
across the post-reset parse and (b) `shmem_inserts` strictly
increased. Counter VALUES are not pinned β each assertion projects
a single boolean comparing deltas, so the expected output stays
`t`-flat across cumulative-counter drift from prior tests in the
same psql session.
Locks #60 of the countdown: `pgrdf.plan_cache_clear()` MUST return
the literal count of prepared statements drained from THIS
backend's `thread_local!` plan-cache HashMap β NOT zero, NOT a
constant, NOT the cumulative shmem `plan_cache_inserts` counter.
The implementation in `src/query/plan_cache.rs::plan_cache_clear`
reads `m.len()` BEFORE calling `m.clear()` and returns that as
`i64`; a refactor that swaps `m.len()` for a constant, or hoists
the `len()` call to AFTER `m.clear()` (always returning 0), would
corrupt operator-facing telemetry. New
`tests/regression/sql/64-plan-cache-clear.sql` locks four
invariants: (a) fresh backend β `clear()` returns 0 (nothing to
drop); (b) after one `parse_turtle` + three structurally distinct
SPARQL shapes, the drained count matches the pre-clear
`plan_cache_local_size` snapshot; (c) `plan_cache_local_size = 0`
immediately after the clear; (d) a second consecutive clear
returns 0 (idempotent at zero). Empirically `size_before = 4` on
the current pgrx 0.16 / PG 17 build (1 ingest-side `flush_batch`
INSERT plan + 3 SELECT plans), but the test locks the RELATION
`drained = size_before AND size_after = 0 AND idempotent_clear =
0 AND size_before > 0` rather than the literal β an ingest-path
refactor that takes `flush_batch` off the plan-cache path leaves
the test still passing as long as the contract holds. Bare-row
`SELECT count(*) FROM pgrdf.sparql(...)` calls are the cleanest
way to drive distinct plans into the cache; `\gset` captures the
snapshots without polluting the expected-output stream.
Locks #59 of the countdown: `pgrdf.parse_turtle()` MUST accept
*triple-free* Turtle input without panicking and MUST return `0`
as the inserted-triple count. The parser path in
`src/storage/loader.rs::ingest_turtle_with_stats` drives an oxttl
`TurtleParser` iterator whose for-loop body β the only site that
interns dict ids and pushes onto `batch_s/p/o` β runs ONCE PER
TRIPLE. Inputs that contain no triples (empty string,
whitespace-only, Turtle comment lines, bare `@prefix` declaration)
yield zero iterator items: the loop body never executes,
`stats.triples` stays `0`, the trailing `flush_batch()` flushes
empty vectors (no SQL is emitted to `_pgrdf_quads`), and the
function returns `0`. New `tests/regression/sql/65-parse-turtle-empty.sql`
locks six invariants in one go: each of the four
zero-triple inputs returns `0` (four booleans), `_pgrdf_quads`
for the test graph stays empty (one boolean), and
`_pgrdf_dictionary` stays empty across all four parses (one
boolean β interning is loop-body-only, so the `@prefix` IRI in
case 4 is parser-scope state, not a dict write). This is the
orthogonal correct-path companion to the malformed-input case
noted in `81-error-paths.sql` (where the parser panics with the
literal `load_turtle: turtle parse error: β¦` prefix): an EMPTY
parser iterator is NOT a parse error β it returns `0` cleanly.
Guards against a refactor that wraps the loop in a "fast-path"
panicking on empty input, that seeds a placeholder dict/quad row,
or that mishandles `flush_batch()` of zero-length arrays. The
whitespace-only case uses an `E''` extended-string literal so
`\n` / `\t` reach the parser as actual whitespace rather than
literal backslash-n (which the Turtle grammar correctly rejects).
Locks #58 of the countdown: the smoke-ontologies set MUST keep
parsing AND each ontology's triple count MUST stay stable. The
existing `tests/perf/smoke-ontologies.sh` loads every `*.ttl`
under `fixtures/ontologies/` through `pgrdf.load_turtle` into its
own graph and prints `<filename>: <triples>` per file; today's
snapshot is **24 ontologies, 17,134 triples** (workflow.ttl held
out per ERRATA E-007). Slice #58 captures that snapshot as
`tests/perf/smoke-ontologies.expected.tsv` (alphabetically-sorted
`filename<TAB>triples` rows) and adds a `--check` mode to the
smoke script: it re-runs the smoke, regenerates the TSV from the
live output, and `diff -u`'s it against the lock-file, exiting
non-zero on any drift. The diff catches two regression classes
the bare smoke can't: (a) an ontology that used to parse stops
parsing β the row disappears from the actual side; (b) the
parser silently drops or duplicates triples and the count moves
even though parsing nominally succeeds. The check is NOT yet
wired into CI (the fetched ontology payloads under
`fixtures/ontologies/*.ttl` are gitignored, so CI can't run the
smoke without a fetch step that doesn't yet exist in the
workflow). Landing the lock-file + the opt-in `--check` mode
now means a future Phase 6 slice can wire `--check` once the
ontology-fetch step is added to CI. The default behaviour
(no flag β pretty-print results, exit 0) is unchanged so
existing manual runs still work. Updating the lock-file is a
deliberate maintenance step β when an upstream ontology updates
and the new count is intentional, regenerate the TSV from a
fresh smoke run and commit the delta as one explicit move; no
`--accept`-style automatic refresh.
Locks #57 of the countdown: the end-to-end round-trip from
`pgrdf.parse_turtle` ingest through `pgrdf.sparql` query MUST
preserve every triple the parser saw, across all four
object-term kinds AND the blank-node-subject case. New
`tests/regression/sql/66-parse-sparql-roundtrip.sql` parses a
single 5-shape Turtle fragment and asserts five
`bool_and(EXISTS (SELECT 1 FROM pgrdf.sparql(β¦) WHERE β¦))`
booleans, one per shape: (1) IRI object β
`ex:alice foaf:knows ex:bob` resolves with the bob IRI as the
lexical projection of `?o`; (2) plain literal β
`foaf:name "Alice"`; (3) typed literal β
`ex:age "30"^^xsd:integer` projects `"30"`; (4) lang-tagged
literal β `ex:bio "Engineer"@en` projects `"Engineer"`; (5)
blank-node subject β the anonymous `[ a foaf:Person ;
foaf:name "Anon" ]` is keyed via a sibling-property join
`?s foaf:name "Anon" . ?s foaf:name ?n` so the
parser-allocated bnode id stays out of the assertion and the
contract is "queryable via sibling property", not "this
specific bnode id". Sibling to `61-materialize-then-sparql.sql`
which locks the materializeβsparql edge; together they pin
both ends of the storage layer's visibility contract to the
SPARQL surface. Datatype URI and lang-tag echo policy are
NOT pinned by this slice β the `pgrdf.sparql` projection
emits the lexical value only; the storage-side datatype-URI
contract is locked separately by `21-typed-literals.sql` and
the lang-tag contract by `22-lang-tags.sql`. Guards against a
refactor that loses a triple from the dictβquads write path
for any one of these five term kinds.
Locks #56 of the countdown: the `pgrdf.stats()` JSONB shape MUST
NOT silently gain, lose, rename, or `null` a field β the canonical
key set is closed at the 10 keys emitted by
`src/storage/stats.rs::stats()` today (`shmem_ready`, `shmem_slots`,
`shmem_hits`, `shmem_misses`, `shmem_inserts`, `shmem_evictions`,
`plan_cache_hits`, `plan_cache_misses`, `plan_cache_inserts`,
`plan_cache_local_size`). Extends the existing `82-stats-shape.sql`
in-place (no new pg_regress file β the file is explicitly scoped to
schema-shape contract and these three new invariants are schema
shape too) with three appended assertion blocks: (a) exact field
count β `count(*) FROM jsonb_object_keys(stats()) = 10`, the
deliberate-update tripwire that fires the moment any new field
lands without a corresponding test update; (b) keys-match-canonical
β `array_agg(k ORDER BY k) = ARRAY[β¦literal 10-element listβ¦]`,
catches both silent additions and silent renames in one assertion
(an addition makes the array longer; a rename swaps an element);
(c) no-null-fields β `bool_and(jsonb_typeof(value) != 'null')`,
catches a refactor that defaults an uninitialised counter to JSON
`null` rather than `0` (the type-contract block above would not
fire on a null since `jsonb_typeof(null) = 'null'` is checked
positively only on the seven existing-key assertions, not on
unknown keys). The existing "fields-that-should-be-there are
there" assertions are sibling to these "fields-that-shouldn't-be-
there ARE NOT there" assertions; together they pin the closed-set
shape contract that downstream operator tooling (CloudNativePG
operators, CI dashboards, client telemetry parsers) wires against.
Test bar: **93 pgrx + 39 pg_regress + 23 W3C-shape + 3 LUBM-shape
= 158 tests**, green locally. Slice #58 doesn't add a pg_regress
file β the smoke is a separate harness, so its lock-file (24
rows / 17,134 triples) lives alongside the script and is
enforced by `tests/perf/smoke-ontologies.sh --check`. Slice #57
adds the 39th pg_regress file (`66-parse-sparql-roundtrip.sql`).
Slice #56 extends `82-stats-shape.sql` in-place β three new
assertion blocks, three new rows in the expected baseline β no
test count bump (still 39 pg_regress files).
### Translator fix β type-aware `MIN` / `MAX`
`src/query/executor.rs::translate_aggregate` for `MIN` / `MAX`
previously emitted
MIN(lexical_value)
which sorts lexicographically β so over the four `xsd:integer`
literals `10, 2, 100, 20` it returned `"10"` (since
`"10" < "100" < "2" < "20"` as strings). Now emits
COALESCE(MIN(numeric_cast_subselect)::text, MIN(lexical_value))
so when any row in the group has an `xsd:numeric` datatype the
numeric MIN/MAX wins (matches the SUM/AVG path that has been
type-aware since Phase 2.2). Pure-string groups fall back to
lexicographic ordering. Mixed-type groups prefer numeric β the
SPARQL spec (Β§17.4) leaves mixed-type ordering
implementation-defined.
Coverage: new `tests/w3c-sparql/23-min-max-numeric/` β fixture's
`xsd:integer` literals `10/2/100/20` produce `MIN=2, MAX=100`
(would have been `MIN="10", MAX="20"` lexicographically).
Test bar: **93 pgrx + 33 pg_regress + 23 W3C-shape + 3 LUBM-shape
= 152 tests**, green locally. v0.4 deferred SPARQL surface
shrinks by one entry.
### Translator fix β inline `HAVING(SUM(?v) > c)` now supported
`src/query/executor.rs::AggregateSpec` gains a `synth_aliases:
Vec<String>` field that preserves spargebra's synthetic
intermediate-variable name even after `Extend` renames
`output_var` to the user's AS-alias.
Why: spargebra emits algebra of the form
```
Project { Filter(HAVING) { Extend(?total = $synth) {
Group(aggregates=[($synth, SUM(?p))]) } } }
```
The `Extend` visitor previously rewrote `output_var` from `$synth`
to `?total` and dropped the `$synth` mapping. The HAVING filter
`Filter(Greater(Variable($synth), Literal(15)))` then couldn't
find its aggregate during the filter-migration step and fell
through to the non-aggregate-aware FILTER translator, producing
`sparql: FILTER expression not translatable`.
The fix:
- `AggregateSpec.synth_aliases` is initialised by
`parse_aggregate` with the original `$synth` name and never
modified by `Extend`.
- The filter-migration step's `agg_names` is now the union of
every aggregate's `output_var` AND its `synth_aliases`.
- `translate_filter_with_aggregates`'s lookup helper (`find_agg`)
consults both fields.
Effects:
- `tests/regression/sql/80-unsupported-shapes.sql::gap-1` removed
(was negative-locked; no longer a gap).
- New positive coverage:
`tests/w3c-sparql/22-having-inline-aggregate/` β same shape as
`08-aggregates-having` but with the inline `HAVING(SUM(?p)>15)`
form. Hand-computed expected output verified.
- Both forms are now first-class. `08`'s description.md updated
to note the companion test.
- v0.4 SPARQL-surface deferred list shrinks by one entry.
Test bar: **93 pgrx + 33 pg_regress + 22 W3C-shape + 3 LUBM-shape
= 151 tests**, green locally.
### Translator-gap regression signals + Phase 6 step 3 scaffolding
Two adjacent additions, motivated by a real translator gap I hit
while expanding the W3C-shape harness (W3C 08 β inline `HAVING(SUM
(?v) > c)` falls through with `FILTER expression not translatable`
when spargebra synthesises a fresh aggregate node for the HAVING).
**1. `tests/regression/sql/80-unsupported-shapes.sql`** β
regression signals locking the failure-mode contract for every
known unsupported SPARQL shape. Each gap drives a query that MUST
fail, and asserts via plpgsql `EXCEPTION WHEN OTHERS` that
`SQLERRM` contains a stable error-prefix substring. The check
helper outputs a clean boolean (`t` = expected substring present)
rather than the raw error message β so the baseline isn't pinned
to spargebra's algebra-dump format, synthetic variable hashes, or
upstream `dataset` / `base_iri` internals.
Gaps locked in:
- `gap-1` β `HAVING(SUM(?v) > c)` inline (vs the supported
alias form `HAVING(?total > c)`).
- `gap-2` β multi-triple OPTIONAL.
- `gap-3` β VALUES inline data block.
- `gap-4` β GRAPH named-graph clause.
- `gap-5` β CONSTRUCT query form.
- `gap-6` β DESCRIBE query form.
- `gap-7` β property path with `*` repetition.
- `gap-8` β aggregates over UNION.
If pgRDF accidentally starts producing wrong results for any of
these shapes (translator regression), the baseline diff fires
with `unexpected success`. If we genuinely add support for a
shape, this file is the single place to flip the assertion to a
positive test.
**2. `tests/perf/lubm-shape/`** β Phase 6 step 3 scaffolding.
Three hand-authored LUBM-shape queries (`Q1` class membership,
`Q2` `teacherOf`, `Q3` `takesCourse` aggregate) against a small
LUBM-shape fixture. Same directory-per-test layout + bash runner
shape as `tests/w3c-sparql/`; runs alongside the W3C harness in
the CI `regression` job. Real LUBM-1/10/100 with the Java
generator + cross-engine comparison vs Apache Jena TDB and
Apache AGE remains v0.4 work (see `tests/perf/README.md`).
Test bar: **93 pgrx + 31 pg_regress + 18 W3C-shape + 3 LUBM-shape
= 145 tests**, green locally.
### Phase 6 step 2 starter β W3C-shape SPARQL harness
- `tests/w3c-sparql/` ships a directory-per-test harness with **13
hand-authored W3C-shape conformance tests** covering common
spec patterns:
- `01-basic-bgp` β Β§5 Basic Graph Pattern.
- `02-distinct` β Β§15.4 `SELECT DISTINCT` multiset β set.
- `03-union-disjoint` β Β§18.2.4 `UNION` with disjoint variables
(unbound β null in cross-branch rows).
- `04-optional-chain` β Β§6 `OPTIONAL` keeps the row when the
optional pattern fails to match.
- `05-minus-no-shared` β Β§8.3.2 `MINUS` with no shared variables
is a no-op (the translator elides the WHERE NOT EXISTS).
- `06-filter-isiri` β Β§17.4.2.1 `isIRI` term-type filter.
- `07-aggregates-count` β Β§11 `COUNT(?v) GROUP BY ?s`.
- `08-aggregates-having` β Β§11.5 `HAVING(?alias > c)` after `SUM`.
- `09-order-by-desc` β Β§15.1 `ORDER BY DESC(?v)`.
- `10-limit-offset` β Β§15.2 / Β§15.3 `LIMIT 2 OFFSET 2`.
- `11-bind-concat` β Β§10.1 `BIND` + Β§17.4.3.2 `CONCAT(...)`.
- `12-ask-true` β Β§16.2 `ASK` returning `true`.
- `13-ask-false` β Β§16.2 `ASK` returning `false`.
- `14-filter-regex` β Β§17.4.3.14 `REGEX(?v, "^A")`.
- `15-filter-in` β Β§17.4.1.9 `FILTER(?v IN (...))`.
- `16-strlen` β Β§17.4.3.3 `STRLEN(?v)`.
- `17-lang-tag` β Β§17.4.2.4 `LANG(?v)` over language-tagged literals.
- `18-ucase` β Β§17.4.3.8 `UCASE(?v)`.
- `tests/w3c-sparql/run.sh` is a bash runner: for each test it
drops + recreates the extension, loads `data.ttl`, runs
`query.rq` via `pgrdf.sparql`, sorts both sides
lexicographically (bag-equivalent comparison; SPARQL solutions
are unordered absent ORDER BY), and `diff -u`s against
`expected.jsonl`. `ACCEPT=1` regenerates expected; every
baseline must be hand-verified against the W3C spec.
- Each test ships a `description.md` quoting the spec section
exercised + the hand-computed expected JSONL β load-bearing for
reviewers and for the "never ACCEPT=1 blind" rule from v0.3 Β§6.2.
- Wired into CI's `regression` job (right after the pg_regress
suite, using the same compose Postgres). The W3C harness is
gated PR-on / push-on like the rest of the regression suite.
- `regression-w3c.yml` nightly workflow stays gated `if: false`
β it's the destination shape for the **full W3C TTL-manifest
runner** (`pgrdf-w3c-sparql` Rust binary parsing
`w3c/rdf-tests/sparql/sparql11/manifest.ttl` against the
ratcheting coverage targets `β₯ 30 % β β₯ 70 % β β₯ 95 %`).
v0.4 work item; not blocking the v0.3 release.
### Phase 6 step 1 β regression suite in CI
- `.github/workflows/ci.yml`: new `regression` job runs the
compose-based pg_regress suite on every PR + push to main. The
job:
- Builds `pgrdf.so` via `compose/builder.Containerfile`
(BuildKit, same path as the local dev loop).
- Boots `postgres:17.4-bookworm` via `docker compose up -d` with
the artifacts bind-mounted at the canonical paths.
- Waits on the compose healthcheck, then drives
`tests/regression/sql/NN-*.sql` via
`PGRDF_RUNTIME=docker bash tests/regression/run.sh`.
- Captures `docker logs pgrdf-postgres` on failure for triage.
- Tears the stack down with `compose down -v` on `always()`.
- Pinned to PG 17 today (compose pin per ERRATA E-006). Widens to
the full matrix when the PG-18 / pgrx issue clears.
- `tests/regression/run.sh` already honoured `PGRDF_RUNTIME` so no
runner changes needed.
**Deferred (still placeholders):**
- W3C SPARQL 1.1 + SHACL conformance runners live in
`.github/workflows/regression-w3c.yml` gated `if: false`. Need a
Rust runner binary that reads the manifest TTL, materialises each
test's data graph, runs the query, and diffs against the expected
result. v0.4 work item.
- LUBM-10 / LUBM-100 perf comparison vs Jena TDB and Apache AGE.
Needs `tests/perf/run-lubm.sh` + a normalised reporting layer.
v0.4 work item.
- Release workflow (`release.yml`) is wired but only fires on
`v*` tags. Tag the first release once Phase 6 step 2 (the
conformance runners) lands.
### Phase 5 β SHACL `pgrdf.validate` ships as a STUB
- `src/validation/shacl.rs`: `pgrdf.validate(data_graph_id,
shapes_graph_id) β JSONB` is wired with a stable response shape
but a `{"status": "stub", "reason": "...", β¦}` body. The UDF
echoes both graph IDs and reports the actual triple count in
each β enough for downstream tooling (CloudNativePG operators,
client libraries, CI jobs) to integrate the SQL surface today.
- 2 new pgrx tests (`validate_stub_shape`,
`validate_stub_unknown_graphs`) lock the JSONB schema.
- New regression `70-validate-stub.sql` asserts: status = "stub",
`data_graph_id` / `shapes_graph_id` echoed, triple counts
matched, `conforms` is `null`, `results` is an empty array,
`reason` field present. Hand-computed; never ACCEPT=1 baselined.
- Test bar: **91 β 93 pgrx + 29 β 30 regression**, green.
**Why a stub, not a real impl.** New ERRATA entry
[`E-009`](specs/ERRATA.v0.2.md). Briefly:
- `shacl_validation 0.2.x` (latest 0.2.12) ships an unfinished
`iri_s` β `rudof_iri` migration; `shacl_ast 0.2.9` fails to
compile against the resolved tree
(`expected rudof_iri::IriS, found iri_s::IriS`).
- `shacl_validation 0.1.149` compiles in isolation but its
transitives turn on `oxrdf`'s `rdf-12` feature, which adds
`TermRef::Triple(_)` β a variant `reasonable 0.4.1`'s pattern
match doesn't handle. Cargo feature unification means we can't
have both crates in one workspace until either upstream catches
up.
- We chose to ship Phase 4 (inference) first because it's
load-bearing; Phase 5's real implementation is a v0.4 follow-up
the moment upstream unblocks. The stub keeps the surface
available so nothing downstream gets blocked on a missing UDF.
- `Cargo.toml` carries the `shacl_validation = "0.2"` line
commented out with the full reason inline.
### Phase 4 β OWL 2 RL materialization via `reasonable`
- `Cargo.toml`: `reasonable = "0.4"` (0.4.1, 2026-05-10 publish).
Pulls in `datafrog 2`, `disjoint-sets 0.4`, `roaring 0.5`,
`rio_api / rio_turtle 0.7`, `farmhash 1`, `serde_sexpr 0.1`. The
oxrdf version requirement (`^0.3.3`) matches our existing pin so
triple types unify cleanly across the codebase.
- `src/inference/reasonable.rs`: full implementation of
`pgrdf.materialize(graph_id BIGINT) β JSONB`. Flow:
1. Idempotency β wipe every `is_inferred = TRUE` row in this
graph via a single `DELETE β¦ RETURNING 1` + count aggregate.
2. Bulk-rehydrate base triples β one `SELECT β¦ JOIN
_pgrdf_dictionary Γ 3 + LEFT JOIN dt` round-trip builds
`Vec<oxrdf::Triple>` directly. Datatype + language tag both
carried; blank-node subjects + object IRIs / literals all
supported.
3. `Reasoner::new().load_triples(base).reason()` β OWL 2 RL
forward chain.
4. Set-diff against the base `HashSet<Triple>` to isolate
entailed-but-not-asserted triples (filters out the base AND
the OWL 2 RL axiomatic triples that match the input).
5. Each new triple's terms intern via `put_term_full` (shmem-
warm path from Phase 3 step 1) and INSERT with
`is_inferred = TRUE`.
- Stats JSONB:
`base_triples / inferred_triples_written /
previous_inferred_dropped / reasoner_errors[] / elapsed_ms`.
- 3 new pgrx tests:
`materialize_subclass_chain` (verifies
`?a a :Engineer β ?a a :Person`),
`materialize_is_idempotent` (two calls produce the same row count
and drop the prior output),
`materialize_pure_data_preserves_input` (base survives).
- New regression `60-materialize-owl-rl.sql` covers:
- 2-hop subClassOf chain
(`Engineer β Person β Agent` plus assertions β
`alice a Person`, `alice a Agent`, `bob a Agent`).
- Idempotence β `previous_inferred_dropped` equals the prior
`inferred_triples_written`.
- `owl:inverseOf` entailment
(`:owner :owns :store` β `:store :ownedBy :owner`).
- Test bar: **88 β 91 pgrx + 28 β 29 regression**, green.
Scope honest. `reasonable` implements OWL 2 RL only. OWL 2 EL/QL
and arbitrary Datalog beyond RL are NOT covered. Pre-existing
ERRATA E-002 (LLD Β§2 β "reasonable Datalog reasoner") remains
correct; v0.3 LLD Β§5.2 already restricts the slice to RL.
### Phase 3 step 3 β bulk-ingest prepared INSERT (LLD Β§4.3 phase A)
- `src/storage/loader.rs`: the batch-flush SQL is a constant
string; the per-backend `plan_cache` from Phase 3 step 2 stashes
the prepared `INSERT β¦ SELECT FROM unnest(β¦)` exactly once.
Every flush across every load in the same backend reuses the
cached `OwnedPreparedStatement`. Saves one parse+plan per batch
(typically ~100β500 Β΅s each on PG 17).
- `flush_batch` now runs inside `Spi::connect_mut(|c| {β¦})` and
binds arguments as `Vec<DatumWithOid>` (three `INT8ARRAY` + one
`INT8`), driving the cached plan via `client.update`.
- `tests/regression/sql/52-bulk-ingest-perf.sql` + new
`fixtures/regression/synth-10k.{sh,ttl}` fixture (10 000
triples = β₯ 10 flushes per load). Asserts:
- Load 1 produces exactly one `plan_cache_misses` += 1 and one
`plan_cache_inserts` += 1 (the cold prepare).
- Two loads together produce β₯ 19 `plan_cache_hits` (the other
flushes all hit).
- Load 3 produces zero new inserts (cache fully warm).
Hand-computed; never `ACCEPT=1` baselined.
- Test bar: **88 β 88 pgrx + 27 β 28 regression**, green.
**Honest framing β wall-clock target.** LLD Β§4.3 calls for *"ingest
throughput at least 2Γ the current batched-INSERT baseline"*. The
prepared-INSERT cache saves a few hundred Β΅s per batch but the
batched-INSERT executor walk (`SELECT β¦ FROM unnest(β¦)` per-tuple
construction + partition routing) still dominates per-batch wall
clock. Observed: synth-100 unchanged within noise; synth-10k
~85 ms steady-state on both before/after.
To hit the 2Γ bar the next slice has to bypass the executor β
either `pg_sys::heap_multi_insert` directly (skips per-tuple
projection and the partition tuple-router uses heap-bulk paths) or
the proper `BeginCopyFrom` + binary COPY-protocol feed. Both are
FFI-heavy. Tracked as **Phase 3 step 3b (deferred)** β does NOT
block Phase 4 (Inference) start.
### Phase 3 step 2 β prepared-plan cache (LLD Β§4.2)
- `src/query/plan_cache.rs`: per-backend `thread_local!`
`HashMap<String, OwnedPreparedStatement>`. Cumulative
`plan_cache_hits / misses / inserts` counters live in shmem
(alongside the dict-cache counters) so a multi-backend view is
available through `pgrdf.stats()`. Per-backend cache size is
surfaced as `plan_cache_local_size`.
- `src/query/executor.rs`: every dict-id constant that used to be
inlined into the dynamic SQL (`bind_subject/predicate/object`,
`expr_to_id_sql`, `translate_in`, `numeric_datatype_id_list`,
β¦) now becomes a `$N` positional placeholder. A thread-local
`PARAM_BUF` collects the resolved i64s in declaration order;
`translate()` snapshots it into `ExecPlan { sql, params }`.
The SQL string itself is the canonical cache key β identical
algebra shape β identical key by construction.
- `execute()` consults the per-backend cache before paying for
parse + plan. Miss path uses `client.prepare(sql, &[INT8OID; n])`
followed by `.keep()` to promote to `'static`-lifetime
`OwnedPreparedStatement`. Hit path reuses the stashed statement
with a fresh `Vec<DatumWithOid>` built from `plan.params`.
- `pgrdf.plan_cache_clear() -> bigint` returns the number of
plans dropped from THIS backend's cache. Useful for diagnostics
and tear-down; production workloads never need it.
- `pgrdf.stats()` JSONB now includes four new fields:
`plan_cache_hits`, `plan_cache_misses`, `plan_cache_inserts`,
`plan_cache_local_size`.
- **Perf regression**: new `tests/regression/sql/51-plan-cache.sql`
exercises three blocks:
- 5 identical queries β 1 miss + 4 hits (single shape, repeat).
- 2 queries with same parametric shape but different IRI
constants β 1 miss + 1 hit (parameterisation works β SQL
string stays byte-identical despite constant change).
- 1 structurally distinct query (FILTER added) β 1 miss + 0 hits.
Also asserts `plan_cache_clear() >= 2` and
`plan_cache_local_size == 0` post-clear. **All deltas
hand-computed**.
- 2 new pgrx integration tests: `plan_cache_repeats_hit`,
`plan_cache_clear_returns_count`.
- Test bar: **86 β 88 pgrx + 26 β 27 regression** tests, green.
### Phase 3 step 1 β shmem dict cache (LLD Β§4.1)
- `src/storage/shmem_cache.rs`: process-wide, cross-backend
dictionary cache backed by `pgrx::PgLwLock<[Slot; 16_384]>` (~512
KiB shmem). Slot carries a u128 fingerprint (two SipHash variants)
plus dict_id + generation. Open-addressed with 8-deep linear
probing; canonical-slot eviction on full streak.
- `_PG_init` gates `pg_shmem_init!` on
`pg_sys::process_shared_preload_libraries_in_progress` so hook
registration only happens in the postmaster scan. Lazy-loaded
backends short-circuit every lookup and fall back to the per-call
HashMap path. Compose already sets
`shared_preload_libraries=pgrdf`; the pgrx-test harness's
`postgresql_conf_options` now does too.
- `put_term_full` consults shmem before SELECT. On both SELECT-hit
and INSERT it **stages** the (key β dict_id) mapping in a
per-backend pending list; pgrx's `register_xact_callback` flushes
to shmem on `XACT_EVENT_COMMIT` and discards on
`XACT_EVENT_ABORT`. The deferred publish keeps shmem in lockstep
with the dictionary table β a rolled-back INSERT never leaves an
orphan id in the cache.
- `pgrdf.stats() -> JSONB` exposes cumulative shmem counters
(`shmem_ready`, `shmem_slots`, `shmem_hits`, `shmem_misses`,
`shmem_inserts`, `shmem_evictions`) β observability target for
the LLD Β§4.1 acceptance criterion.
- `pgrdf.shmem_reset() -> void` atomically bumps a shmem
generation counter so every previously-cached entry reads as
cold on next lookup. Required after
`DROP EXTENSION pgrdf; CREATE EXTENSION pgrdf;` (the dict id
space resets but the cache survives) β also useful in regression
setup. Slot generation is part of the slot record; mismatch on
lookup is silent and equivalent to a miss.
- Per-call `load_turtle_verbose` stats gain `shmem_cache_hits` β
the count of term references that fell through the per-call
HashMap and were satisfied by the cross-backend shmem cache
without touching `_pgrdf_dictionary`. Loader snapshots the
global HITS counter around each `put_term_full` call to
attribute hits.
- **Perf regression**: new `tests/regression/sql/50-shmem-dict-cache.sql`
loads `fixtures/regression/synth-100.ttl` (100 triples, 115
distinct terms) three times into successive graphs and asserts:
load 1 has 115 db calls + 0 shmem hits; loads 2β3 have 0 db
calls + 115 shmem hits each. Cumulative counter deltas asserted
via `pgrdf.stats()` β₯ 230 shmem hits / β₯ 115 inserts vs pre-test
snapshot. **All expected values hand-computed**, never
autobaselined.
- 6 new pgrx integration tests cover the cache primitive
(`shmem_ready_in_test`, `shmem_roundtrip_via_committed`,
`shmem_disambiguates_keys`, `shmem_datatype_in_key`,
`shmem_counters_advance`, `shmem_reset_invalidates_slots`).
- Test bar: **85 β 86 pgrx + 25 β 26 regression** tests, all green.
### LLD v0.3 β Refocus
- [`specs/SPEC.pgRDF.LLD.v0.3.md`](specs/SPEC.pgRDF.LLD.v0.3.md)
shipped. Supersedes v0.2 at the contract level; v0.2 LLD is now
historical (still referenced for Β§4.1β4.3 internals that haven't
changed). INSTALL spec (`SPEC.pgRDF.INSTALL.v0.2.md`) unchanged.
- The v0.3 LLD acknowledges Phase 3 steps 1β12 (SPARQL surface)
as substantively complete (BGP + FILTER + OPTIONAL + UNION +
MINUS + DISTINCT/LIMIT/OFFSET/ORDER BY + aggregates + HAVING +
GROUP_CONCAT/SAMPLE + expression richness + BIND + multi-triple
MINUS + ASK) and re-bins forward work:
- **Phase 3 (NEW): Storage Performance** β shmem dict cache
(v0.2 LLD Β§4.1), prepared-plan cache (Β§4.2), COPY BINARY
ingestion (Β§4.3). The single biggest remaining LLD gap.
- **Phase 4**: Inference engine (OWL 2 RL via `reasonable`).
- **Phase 5**: Validation engine (SHACL via `shacl_validation`).
- **Phase 6**: W3C SPARQL/SHACL conformance + LUBM + release
artifacts + CI matrix.
- v0.4 deferral list (none block Phase 3):
- GRAPH `{ β¦ }` named-graph clause (needs storage schema work)
- VALUES inline tables
- Property paths beyond simple sequence (`*`, `+`, `?`, `^`)
- Multi-triple OPTIONAL (needs LATERAL refactor)
- CONSTRUCT, DESCRIBE (different output shape)
- Aggregates over UNION
- BIND output referenced in later FILTER / BGP
- Type-aware ORDER BY / MIN / MAX
- v0.3 also formalises the **empirical-verification rule**: new
regression fixtures hand-compute their expected output; no
`ACCEPT=1` autobaselining of new query coverage.
- Per-call `pgrdf.load_turtle_verbose` stats will gain
`shmem_cache_hits` + `plan_cache_hits` in Phase 3 to support
perf regression tests on the synth-100 fixture.
- Cross-references updated: `README.md`, `docs/README.md`,
`docs/10-roadmap.md`, `specs/ERRATA.v0.2.md`.
### Phase 3 step 12 β ASK query form
- `pgrdf.sparql('ASK { β¦ }')` now works. Returns a single JSONB
row `{"_ask": "true"}` or `{"_ask": "false"}` reflecting whether
the pattern has at least one solution.
- The pattern walk reuses `parse_select` so ASK transparently
supports FILTER, OPTIONAL, UNION, MINUS, and any combination
the SELECT executor handles. `build_ask_probe_sql` emits a
`SELECT 1 FROM β¦` probe wrapped in `EXISTS(β¦)` in the outer
query.
- `pgrdf.sparql_parse` now reports `form: "ASK"` with the same
`bgp_pattern_count` / `bgp_patterns` / `unsupported_algebra`
shape it gives SELECT, rather than `supported: false`.
- 2 new pg_tests: ASK match/no-match, ASK with FILTER.
- `tests/regression/sql/44-sparql-ask.sql` covers 6 query shapes:
match, no-match, FILTER pass/fail, ASK with OPTIONAL, ASK with
UNION.
- `README.md` pills: 77+24 β 79+25.
- `CONSTRUCT` and `DESCRIBE` change the output shape (triples
instead of solutions) and are **deferred to v0.4**.
Test bar:
pg_test: 79 passed; 0 failed (was 77)
regression: 25 passed; 0 failed (was 24)
### Phase 3 step 11 β Multi-triple MINUS
- `MINUS { ?s :p ?o . ?s :q ?r . β¦ }` now accepts arbitrary
N-triple sub-patterns. `ParsedSelect.minuses` changed from
`Vec<TriplePattern>` to `Vec<Vec<TriplePattern>>`; same for
`UnionBranch.minuses`.
- `translate_minus` rewrites to emit one `NOT EXISTS (SELECT 1
FROM q_min_1, q_min_2, β¦ WHERE β¦)` per MINUS block. Each
triple in the sub-pattern gets its own quad alias; shared
variables with the outer query AND shared-inside-the-MINUS
emit equality predicates automatically via `pattern_clauses`.
- SPARQL spec's "no shared variables β MINUS is identity" rule
still applies: the translator unions all variables in the
sub-pattern, checks intersection with outer anchors, and
elides the block if empty.
- Single-triple MINUS continues to work (it's the
`triples.len() == 1` case of the multi-triple path).
- 1 new pg_test: `sparql_minus_multi_triple` (alice+eve have
both mbox+age β dropped, bob/carol/dave survive).
- `tests/regression/sql/43-sparql-minus-multi.sql` covers 4
query shapes: 2-triple AND, 3-triple AND, chained multi-triple
MINUSes, single-triple back-compat.
- `README.md` pills: 76+23 β 77+24.
- Multi-triple OPTIONAL is **deferred to v0.4** β the LATERAL
refactor it needs is bigger than the MINUS rewrite (OPTIONAL
has to EXPOSE its new bindings to the outer query, while MINUS
is just a boolean check). Workaround: chain single-triple
OPTIONALs.
Test bar:
pg_test: 77 passed; 0 failed (was 76)
regression: 24 passed; 0 failed (was 23)
### Phase 3 step 10 β BIND (non-aggregate)
- `BIND(expr AS ?v)` (and the equivalent `SELECT (expr AS ?v)` form
on non-aggregate expressions) now adds a virtual column. `walk_select`'s
Extend handler falls through to a `BindSpec` when the expression
isn't a Variable-rename of an existing aggregate.
- Projection in `build_single_branch_outer` checks `ps.binds` before
falling back to the BGP anchor lookup, emitting the translated
expression with the BIND var as the column alias.
- `translate_bind_expression` covers Literal / NamedNode / Variable,
STR / LANG / DATATYPE / UCASE / LCASE, arithmetic, STRLEN, and
`CONCAT(?a, ?b, β¦)` via Postgres `concat`. All values surface as
text in the JSONB row.
- Today's restriction: a BIND output variable referenced in a later
FILTER / BGP isn't yet supported (would need expression substitution
during translation). Filtering on BIND output is Phase 3 backlog.
- 3 new pg_tests + `tests/regression/sql/42-sparql-bind.sql`
(6 query shapes: UCASE, arithmetic, CONCAT, literal-constant,
STRLEN, two-BINDs in one query).
- `README.md` pills: 73+22 β 76+23.
Test bar:
pg_test: 76 passed; 0 failed (was 73)
regression: 23 passed; 0 failed (was 22)
### Phase 3 step 9 β Expression richness in FILTER
- `pgrdf.sparql` FILTER translator gains a much wider expression
surface:
- **Arithmetic**: `?a + ?b`, `?a - ?b`, `?a * ?b`, `?a / ?b`
(with NULLIF-guarded divide-by-zero), unary `-`, unary `+`.
All built on top of `expr_to_numeric_sql`'s CASE-cast so
non-numeric operands NULL-propagate instead of erroring.
- **String predicates**: `CONTAINS`, `STRSTARTS`, `STRENDS` β
Postgres `strpos`, `left`, `right` against `lexical_value`.
- **String-valued functions** usable inside other expressions:
`LANG(?v)`, `DATATYPE(?v)`, `UCASE(?v)`, `LCASE(?v)`,
`STR(?v)` (was passthrough, formalised). LANG / DATATYPE use
chained dict lookups (datatype IRI ids β IRI lexical).
- **`STRLEN(?v)`** is numeric-valued, plugged into
`expr_to_numeric_sql`.
- Equality fallback: when either side of `=` / `sameTerm` is a
function call (or otherwise can't resolve to a dict id), the
translator falls back to lexical comparison. Lets `STR(?v) =
"x"`, `LANG(?v) = "en"`, `DATATYPE(?v) = xsd:integer` etc.
translate cleanly.
- `expr_to_lexical_sql` learned to emit a SQL string for
`NamedNode` (the IRI's lexical form), making the fallback work
for IRI constants on the right of equality.
- 6 new pg_tests: arithmetic add, mul/div, STRLEN, CONTAINS/
STRSTARTS/STRENDS, LANG/DATATYPE equality, UCASE/LCASE case
folding.
- `tests/regression/sql/41-sparql-expressions.sql` covers 11
query shapes (4 arithmetic, STRLEN, 4 string predicates,
LANG, DATATYPE).
- `README.md` pills: 67+21 β 73+22.
Test bar:
pg_test: 73 passed; 0 failed (was 67)
regression: 22 passed; 0 failed (was 21)
### Phase 3 step 8 β HAVING + GROUP_CONCAT + SAMPLE
- `pgrdf.sparql` now translates `HAVING (expr)` clauses on
aggregate queries. `parse_select` post-processes the collected
filters: any filter referencing an aggregate output variable
becomes a HAVING predicate (the rest stay as WHERE).
- `translate_filter_with_aggregates` is the HAVING-aware translator:
variable references resolve to (a) the underlying SQL aggregate
function for aggregate-output vars, (b) the group-by expression
for group vars, (c) literals are used directly. Supports
identity, numeric ordering (`<`/`>`/`<=`/`>=`), boolean composition.
- `GROUP_CONCAT(?v [; SEPARATOR = "β¦"])` β Postgres `STRING_AGG`,
default separator a single space per SPARQL spec.
- `SAMPLE(?v)` β `MIN(lexical_value)` as a deterministic surrogate
(SPARQL spec says "implementation-defined element"; MIN is one
conformant choice).
- 4 new pg_tests: HAVING with COUNT, HAVING with SUM, GROUP_CONCAT
with custom separator, SAMPLE.
- `tests/regression/sql/40-sparql-having.sql` covers 9 query
shapes (HAVING > N, HAVING = 1, HAVING composite, GROUP_CONCAT
custom + default separator, SAMPLE, SUM-HAVING on non-numeric
strings β demonstrates the numeric-awareness rule β and
SUM-HAVING on real numeric data across two graphs).
- `README.md` pills: 63+20 β 67+21.
Test bar:
pg_test: 67 passed; 0 failed (was 63)
regression: 21 passed; 0 failed (was 20)
### Phase 3 step 7 β Aggregates + GROUP BY
- `pgrdf.sparql` handles SPARQL aggregates with or without
`GROUP BY`:
- `COUNT(*)`, `COUNT(?v)`, `COUNT(DISTINCT ?v)`.
- `SUM(?v)`, `AVG(?v)` β numeric-aware via the same XSD-numeric
CASE cast as FILTER ordering. Non-numeric values contribute
`NULL` (skipped by SUM/AVG per SQL semantics, no Postgres
cast error).
- `MIN(?v)`, `MAX(?v)` β lexicographic on the term's
`lexical_value`. Type-aware MIN/MAX queued.
- `GROUP BY ?vars` translates to SQL `GROUP BY` using the same
dict-lookup expressions that drive the SELECT clause. Multiple
aggregates per group supported.
- Aggregate output values come back as **JSON strings** in the
`pgrdf.sparql` row, consistent with the rest of the surface.
Callers cast with `(j ->> 'n')::int`/`::numeric` etc.
- Algebra layout: spargebra lowers `SELECT (EXPR AS ?v)` to
`Project β Extend β Group β BGP`. `walk_select` now handles
Extend (renames the synthesised `$agg_N` to `?v`) and Group
(captures group_vars + AggregateSpecs). Walk order: descend
into inner first so Group's aggregates are populated before
Extend tries to rename them.
- Parser walks `GraphPattern::Group` and `GraphPattern::Extend`
rather than flagging them; tests adjusted.
- 7 new pg_tests: COUNT(*), COUNT(DISTINCT), GROUP BY counting,
SUM numeric, AVG numeric, MIN/MAX lex, multiple aggregates
per group.
- `tests/regression/sql/39-sparql-aggregates.sql` covers 10
query shapes: count_all, count_o, count_distinct, sum_age,
avg_age (rounded), min/max names, group_by predicates,
multi-aggregate, ORDER-BY-aggregate + LIMIT.
- `README.md` pills: 56+19 β 63+20; SPARQL pill adds AGGREGATES.
- `guide/03-querying.md` gains a full "Aggregates and GROUP BY"
section covering the JSON-string output rule, the SUM/AVG
numeric-awareness rule, the MIN/MAX lex caveat, and the
HAVING/GROUP_CONCAT/BIND restrictions.
Today's restrictions:
- HAVING not yet translated β post-process with regular SQL.
- BIND outside aggregate aliasing not supported.
- Aggregates on top of UNION not supported (panic with clear msg).
- `GROUP_CONCAT` / `SAMPLE` not supported.
Test bar:
pg_test: 63 passed; 0 failed (was 56)
regression: 20 passed; 0 failed (was 19)
### Phase 3 step 6 β MINUS
- `pgrdf.sparql` handles `MINUS { ?s :p ?o }` and chained MINUSes.
Each block becomes a `WHERE NOT EXISTS (SELECT 1 FROM
pgrdf._pgrdf_quads qMIN_K WHERE β¦)` sub-SELECT, keyed on shared
variables between the outer query and the MINUS triple.
- Per SPARQL spec, MINUS with no shared variables is a no-op β
the translator detects this at translation time and emits no
SQL for that block (different from OPTIONAL, which always
emits a LEFT JOIN).
- Restriction: each MINUS block must be a single triple pattern
(mirrors OPTIONAL's current restriction).
- Inside UNION branches, MINUS works the same way (scoped to the
branch's anchor map).
- 4 new pg_tests: basic MINUS, no-shared-vars no-op, chained
MINUSes, MINUS + outer FILTER + REGEX.
- Parser walks `GraphPattern::Minus` rather than flagging it.
New parser pg_test for the new state + a Path-still-flagged
test taking its place (transitive `:a*`, not simple `:a/:b`
which spargebra desugars to BGP).
- `tests/regression/sql/38-sparql-minus.sql` covers 6 query
shapes (basic, no-op, chained, with-FILTER, ordered survivor,
shared-non-subject-var).
- `30-sparql-parse.sql` baseline updated: MINUS supported, Path
(quantified) is the new unsupported representative.
- `README.md` pills: 51+18 β 56+19, SPARQL pill adds MINUS.
- `guide/03-querying.md` gains a MINUS section covering the
shared-vars-vs-no-op rule and the OPTIONAL-asymmetry note.
Test bar:
pg_test: 56 passed; 0 failed (was 51)
regression: 19 passed; 0 failed (was 18)
### Phase 3 step 5 β UNION
- `pgrdf.sparql` handles `{ A } UNION { B }` and chained
`A UNION B UNION C`. Each branch is its own complete sub-SELECT
(own BGP / FILTERs / OPTIONALs / per-branch dict-id anchors).
Branches are combined with SQL `UNION ALL`; the outer SELECT
layers `DISTINCT` / `ORDER BY` / `LIMIT` / `OFFSET`.
- Variables bound in only some branches come back as `null` from
the other branches (each branch SELECTs `NULL::TEXT` for vars
it doesn't bind, so row shapes line up across `UNION ALL`).
- ORDER BY on UNION may only reference projected variables β the
outer SELECT can't see branch-local alias columns. Executor
panics with a clear message otherwise.
- Refactor: extracted `build_from_and_where` (shared by both the
single-branch and per-UNION-branch paths) + `build_branch_sql`
+ `build_union_sql`. The original `build_bgp_sql` is now a
dispatcher over `ps.union_branches.is_empty()`.
- 5 new pg_tests: basic UNION over same var, different-var
UNION with NULL pad, three-way chain, UNION + DISTINCT,
UNION + ORDER BY + LIMIT.
- Parser walks `GraphPattern::Union` rather than flagging it.
New parser pg_test for the new state + a new MINUS-still-flagged
test taking its place.
- `tests/regression/sql/37-sparql-union.sql` covers 9 query shapes
(basic, DISTINCT, different-var, two NULL-discriminator checks,
three-way chain, ORDER BY first, LIMIT, branch-local FILTER).
- `30-sparql-parse.sql` baseline refreshed: UNION supported,
MINUS now the unsupported representative.
- `README.md` pills: 45+17 β 51+18, SPARQL pill adds UNION.
- `guide/03-querying.md` gains a full UNION section covering
the cross-branch null padding, ORDER-BY-must-be-projected
rule, and the no-nesting restriction for this slice.
Test bar:
pg_test: 51 passed; 0 failed (was 45)
regression: 18 passed; 0 failed (was 17)
### Phase 3 step 4 β OPTIONAL (LeftJoin) translation
- `pgrdf.sparql` now handles `OPTIONAL { ?s :p ?o }`. Each OPTIONAL
block emits a `LEFT JOIN pgrdf._pgrdf_quads qOPT_i ON (β¦)`. Variables
introduced inside an OPTIONAL surface as NULL (JSONB `null`) when
the LEFT JOIN didn't match.
- `OPTIONAL { β¦ FILTER(...) }` β the inner filter lands in the LEFT
JOIN's ON clause, so rejected matches keep the optional variable
NULL rather than pruning the whole row.
- Multiple chained OPTIONALs each get their own LEFT JOIN, in
left-to-right order. Per SPARQL semantics, variables introduced
by one OPTIONAL aren't visible to another OPTIONAL's ON clause.
- `BOUND(?v)` translation tightened: now emits `qN.col IS NOT NULL`
regardless of whether ?v is mandatory or OPTIONAL. Mandatory
anchors are non-NULL so it's trivially TRUE there; OPTIONAL anchors
can be NULL so this is the spec-correct semantics.
- Internal refactor: `build_bgp_sql` switched from comma-style FROM
(`q1, q2, q3 WHERE β¦`) to explicit JOIN syntax
(`q1 INNER JOIN q2 ON β¦ INNER JOIN q3 ON β¦`). Same semantics for
INNER joins; necessary for OPTIONAL's LEFT JOIN to compose.
- Parser updated: `LeftJoin` no longer flagged in
`unsupported_algebra` β the parser walks both arms.
- 4 new pg_tests: simple OPTIONAL, OPTIONAL with inner FILTER,
multiple chained OPTIONALs, outer FILTER(BOUND) pruning.
- `tests/regression/sql/36-sparql-optional.sql` covers 8 query
shapes (LEFT JOIN counts, NULL/not-NULL discrimination, inner
filter, multi-chain, outer BOUND prune, OPTIONAL + ORDER BY).
- `30-sparql-parse.sql` baseline updated: OPTIONAL no longer
flagged; new UNION assertion replaces it.
- `README.md` pills: 40+16 β 45+17; SPARQL pill adds OPTIONAL.
- `guide/03-querying.md` gains a full OPTIONAL section covering
inner-FILTER semantics, chained OPTIONALs, BOUND-pruning, and
the single-triple restriction for this slice.
Test bar:
pg_test: 45 passed; 0 failed (was 40)
regression: 17 passed; 0 failed (was 16)
### Phase 3 step 3 β Solution modifiers (DISTINCT / LIMIT / OFFSET / ORDER BY)
- The four classic SPARQL solution modifiers now land in the
generated SQL instead of being silently stripped from the AST:
- `SELECT DISTINCT ?vars` β `SELECT DISTINCT` in SQL.
- `SELECT REDUCED ?vars` β also `SELECT DISTINCT` (REDUCED is a
"dups may or may not be removed" hint per spec; over-approxing
with DISTINCT is conformant).
- `LIMIT N` / `OFFSET N` β `LIMIT N` / `OFFSET N`.
- `ORDER BY ?var`, `ORDER BY ASC(?var)`, `ORDER BY DESC(?var)`,
multi-key β sorted by the term's `lexical_value` with
`NULLS LAST`. If the var is projected the existing column is
reused; otherwise an extra hidden column is appended and ORDER
BY references it by ordinal (so the JSONB output stays clean).
- ORDER BY today is **lexicographic on string form**, not SPARQL's
full type-aware ordering. Numeric ordering through ORDER BY lands
in step 4+; for now use FILTER for numeric range + post-SQL
`ORDER BY (sparql->>'n')::numeric`.
- Refactor: `unwrap_select` β `parse_select` returning a richer
`ParsedSelect` struct (projected, bgp, filters, distinct,
order_by, limit, offset). Single recursive walk replaces the
old two-pass extract_bgp_and_filters / unwrap_select split.
- 6 new pg_tests: distinct dedups, LIMIT caps, OFFSET skips,
ORDER BY ASC + DESC, DISTINCT + ORDER BY interaction.
- `tests/regression/sql/35-sparql-modifiers.sql` covers 10 query
shapes (raw count, DISTINCT, REDUCED, LIMIT 2, ORDER ASC first,
ORDER DESC first, OFFSET 3 LIMIT 2 window, DISTINCT + ORDER,
ORDER BY on non-projected var, LIMIT 0).
- `README.md` pills: 34+15 β 40+16, SPARQL pill adds
DISTINCT/ORDER/LIMIT.
- `guide/03-querying.md` gains a full "Solution modifiers" section
covering ORDER BY's lexicographic-vs-type-aware caveat, the
DISTINCT-with-non-projected-order-by panic case, and a worked
example.
Test bar:
pg_test: 40 passed; 0 failed (was 34)
regression: 16 passed; 0 failed (was 15)
### Phase 3 step 2 β FILTER numeric ordering + REGEX + IN
- `pgrdf.sparql` FILTER translator gains three new shapes:
- **Numeric ordering** (`<`, `>`, `<=`, `>=`): operand resolves to
`NUMERIC` via a CASE-guarded subselect on `_pgrdf_dictionary`.
Only XSD numeric datatypes (integer, decimal, double, float,
sized + unsigned + constraint subtypes β 16 IRIs total)
contribute; everything else compares NULL β row dropped. This
matches SPARQL's "type error β unbound" semantics without ever
raising a Postgres cast error.
- **`REGEX(?v, "pat" [, "flags"])`**: Postgres `~` (case-sensitive)
or `~*` (with `i` flag) against the term's `lexical_value`.
Pattern + flags are SPARQL literals at translation time;
single quotes in the pattern are escaped. `STR(?v)` inside
REGEX is a passthrough.
- **`?term IN (e1, e2, β¦)`**: dict-id set membership.
- 6 new pg_tests: numeric `>` / range / non-numeric drop, regex
case-sensitive / case-insensitive with STR(), and IN.
- `tests/regression/sql/34-sparql-filter-advanced.sql` covers 10
query shapes (numeric `>`, range, `<` with non-numeric mixed in,
`>= 0` over a typed-decimal row, regex `^A`, regex `ar` case-i,
regex+STR wrap, IN over IRIs, IN over a literal, and a cross-BGP
composition).
- `README.md` pills: tests 28+14 β 34+15, SPARQL pill adds REGEX.
- `guide/03-querying.md` gains full sections for numeric ordering,
REGEX (with the POSIX-vs-PCRE caveat), and IN. Capability matrix
refreshed.
Test bar:
pg_test: 34 passed; 0 failed (was 28)
regression: 15 passed; 0 failed (was 14)
### Phase 3 step 1 β FILTER expressions over BGPs
- `pgrdf.sparql` now walks `GraphPattern::Filter { expr, inner }`
and translates a useful subset of `Expression` into SQL WHERE
predicates appended after the BGP joins:
- **Identity**: `=`, `!=`, `sameTerm` β both operands resolved to
dictionary ids, compared as BIGINT. Sound because the dictionary
deduplicates by `(term_type, lexical, datatype, language)`.
- **Boolean**: `&&`, `||`, `!`.
- **Term-type predicates**: `isIRI`, `isLiteral`, `isBlank` β emit
a correlated subselect on `_pgrdf_dictionary.term_type`.
- **`BOUND`**: trivially `TRUE` for any anchored BGP variable.
- Untranslatable shapes (numeric `<`/`>`/`<=`/`>=`, `regex`, `str`,
`lang`, arithmetic, `IN`, `EXISTS`) panic with a clear message
rather than silently dropping the filter.
- `pgrdf.sparql_parse` no longer flags `Filter` in
`unsupported_algebra` β it walks into the inner BGP. OPTIONAL,
UNION, MINUS, Group, Path, Values, Extend (BIND), Service still
flagged.
- 6 new pg_tests: literal equality, `!=`, `isIRI`, boolean AND
composition, var-equals-var (self-loop), `BOUND` trivially-true.
- 1 new parser pg_test: OPTIONAL replaces the FILTER-flagged baseline.
- `tests/regression/sql/33-sparql-filter.sql` covers 9 query shapes
end-to-end (literal eq, neg, isIRI, isLiteral, self-loop,
boolean AND, negated isIRI, BOUND, unknown-literal-zero-rows).
- `tests/regression/sql/30-sparql-parse.sql` baseline updated: Filter
no longer reported as unsupported; new OPTIONAL assertion added.
- `guide/03-querying.md` adds a full FILTER section with examples,
including the `=` β sameTerm-vs-value-equality caveat and how
filters interact with multi-pattern BGPs.
- `README.md`: status pill β `phase 3 start`, test pill 21+13 β 28+14,
SPARQL pill `SELECT/BGP` β `SELECT/BGP/FILTER`.
Test bar:
pg_test: 28 passed; 0 failed (was 21)
regression: 14 passed; 0 failed (was 13)
### Phase 2.2 step 8 β Node.js + Go client guides
- `guide/clients/typescript.md` β `pg` (node-postgres) + `postgres.js`
+ `pg-cursor` streaming + strongly-typed binding helpers. Covers
`load_turtle`, `parse_turtle`, `load_turtle_verbose`, and the
full `pgrdf.sparql` JSONB result shape with type narrowing.
- `guide/clients/go.md` β `pgx` v5 + `pgxpool` + sqlc integration
+ bulk-ingest pattern + the constant-time graph-drop idiom.
- `guide/README.md` index lists both new client pages.
- `README.md` clients section now points at all 4 supported clients
(Python, Rust, TypeScript, Go).
### Phase 2.2 step 7 β User guide for SPARQL surface
- New `guide/03-querying.md`: full walkthrough of `pgrdf.sparql`
(single + multi-pattern BGPs, constants in any position, JSONB
output, combining with regular SQL, `pgrdf.sparql_parse` for
introspection) plus what works / doesn't / why, and a worked
example of the SQL translation.
- `README.md` promoted the SPARQL surface from "coming soon" to a
live code example, bumped the test pill from 9+10 to 21+13,
added a SPARQL pill, refreshed the status row.
- `guide/README.md` index entry for `03-querying.md`.
### Phase 2.2 step 6 β Multi-pattern BGP joins
- `pgrdf.sparql` now handles N-pattern Basic Graph Patterns. Each
pattern becomes a `_pgrdf_quads qN` clause; shared variables across
patterns are tracked by first-occurrence anchors and emit equality
predicates (`q2.subject_id = q1.subject_id`) that fold into INNER
joins.
- 2 new pg_tests: two-pattern shared-subject BGP (Alice + Carol have
both `foaf:name` and `foaf:mbox`, Bob doesn't), three-pattern chain
following `foaf:knows`.
- `tests/regression/sql/32-sparql-multipattern.sql` covers 5 shapes:
shared-subject BGP, three-pattern chain, self-loop pattern (?s ?p ?s),
bound-subject multi-pattern, and bound-predicate + bound-literal.
Test bar:
pg_test: 21 passed; 0 failed (was 19)
regression: 13 passed; 0 failed (was 12)
### Phase 2.2 step 5 β SPARQL execution: BGP β SQL
- `pgrdf.sparql(q TEXT) β SETOF JSONB` β first user-visible SPARQL
surface. Parses via spargebra, translates a single Basic Graph
Pattern into a dynamic SQL SELECT over `_pgrdf_quads` joined to
`_pgrdf_dictionary`, returns one JSONB row per solution keyed by
the projected variable names.
```sql
SELECT * FROM pgrdf.sparql(
'PREFIX foaf: <http://xmlns.com/foaf/0.1/>
SELECT ?s ?n WHERE { ?s foaf:name ?n }'
);
-- β {"s": "http://example.com/alice", "n": "Alice"}
-- β {"s": "http://example.com/bob", "n": "Bob"}
```
Scope today (intentionally narrow β multi-pattern joins land in
step 6):
- SELECT only.
- Exactly one BGP triple per query.
- Constants in any position (subject IRI, predicate IRI, object
IRI or literal). Unknown constants resolve to `-1` so the query
correctly returns zero rows rather than erroring.
- Variables in any position.
- Distinct / Reduced / Slice / OrderBy wrappers are passed through.
- 4 new pg_tests covering all-three-vars BGP, bound-predicate filter,
bound-subject filter, and unknown-predicate-returns-empty.
- `tests/regression/sql/31-sparql-bgp.sql` exercises 7 query shapes
end-to-end through the compose Postgres.
Infrastructure:
- `compose/builder.Containerfile` rewritten with BuildKit cache
mounts. The builder image dropped from 7.73 GB β 3.35 GB; cargo
registry + target/ now live in build-scoped cache volumes that
persist across rebuilds without bloating image layers.
- `Justfile build-ext` now invokes `DOCKER_BUILDKIT=1 docker build`
so the `# syntax=docker/dockerfile:1.4` directive activates.
- `.dockerignore` excludes `target/`, `.target-linux/`,
`compose/pg-data/`, `compose/extensions/lib|share`,
`fixtures/ontologies/`, `.git/`. Build context dropped accordingly.
### Phase 2.2 step 4 β SPARQL parser surface
- `spargebra = "0.4"` (0.4.6 resolved). Pins `oxrdf = "=0.3.3"`, the
same version oxttl 0.2.3 uses, so no graph split.
- New module `src/query/parser.rs`.
- `pgrdf.sparql_parse(q TEXT) -> JSONB` parses a SPARQL query via
`spargebra::SparqlParser` and returns the high-level shape:
- `form` β SELECT / CONSTRUCT / ASK / DESCRIBE
- `variables` β projected vars (SELECT only)
- `bgp_pattern_count`, `bgp_patterns` β BGP triples with
s/p/o each rendered as `{var: β¦}`, `{iri: β¦}`, `{bnode: β¦}`,
or `{literal: β¦, datatype/lang: β¦}`
- `unsupported_algebra` β flags Filter / Union / OPTIONAL /
Property paths / Aggregates / VALUES / SERVICE / etc., so
callers see the AST has shape the translator doesn't yet
cover.
- 5 new pg_tests covering basic SELECT, predicate-as-IRI BGP,
two-pattern BGP, FILTER detection, and a syntax-error panic path.
- New regression `tests/regression/sql/30-sparql-parse.sql` asserts
the JSONB extraction over 6 query forms.
### Phase 2.2 step 3 β Batched ingestion
(landed alongside docs split + README pills.)
- `src/storage/loader.rs`: per-call HashMap dict cache + buffered
multi-row INSERTs via `unnest($1::bigint[], $2::bigint[], $3::bigint[])`.
BATCH_SIZE = 1000. Reduces SPI calls from ~7/triple to roughly
`distinct_terms + ceil(triples/1000)`.
- `pgrdf.load_turtle_verbose(path, graph_id, base_iri)` and the
matching `pgrdf.parse_turtle_verbose(content, graph_id, base_iri)`
return JSONB stats: `triples`, `dict_cache_hits`, `dict_db_calls`,
`quad_batches`, `elapsed_ms`. Used to assert the cache is firing.
- `fixtures/regression/synth-100.sh` + `synth-100.ttl`: deterministic
100-triple synthetic fixture (10 subjects Γ 5 predicates Γ 100
objects). 115 distinct terms, 185 expected cache hits.
- `tests/regression/sql/25-bulk-ingest.sql` asserts exact stat values
on the synth-100 fixture and verifies dict dedup across two graphs.
- One new pg_test (`parse_turtle_verbose_cache_fires`) asserts cache
behavior at the Rust level.
- `serde_json = "1"` added as a direct dependency for the verbose UDFs.
### Phase 2.1 β Turtle ingest
- `pgrdf.load_turtle(path, graph_id, base_iri)` and
`pgrdf.parse_turtle(content, graph_id, base_iri)` parse Turtle via
`oxttl 0.2` and stream triples through the dictionary +
partitioned hexastore. `base_iri` resolves relative IRIs like
`<#>` (needed for W3C PROV).
- Internal `put_term_full(value, type, datatype_id, lang)` honours
the full dictionary key with `IS NOT DISTINCT FROM` lookups so
NULL datatype + language columns participate in dedup.
- Compose: read-only `./fixtures:/fixtures:ro,z` bind mount so the
postgres process can reach test + ontology fixtures by path.
- 24 W3C / Apache Jena / ConceptKernel / ValueFlows ontologies fetch
cleanly via `fixtures/ontologies.sh`; `tests/perf/smoke-ontologies.sh`
loads each one through `pgrdf.load_turtle` and prints triple
counts. 17,134 triples across the set on the 2026-05-13 fetch.
- Four checked-in regression fixtures (`typed-literals.ttl`,
`lang-tags.ttl`, `blank-nodes.ttl`, `rdf-list.ttl`) under
`fixtures/regression/` exercise XSD datatypes, language tags,
blank-node dedup, and `rdf:List` desugaring. All assertions are
scoped strictly by graph_id so prior smoke loads don't pollute
results.
- `workflow.ttl` excluded from the iteration set: source uses
`<ckp://Name:v0.1>` IRI form (colon in path segment, not RFC 3986
compliant). To be re-added when the CKP source is fixed.
### Phase 2.0 β Storage CRUD UDFs
- `pgrdf.put_term(value, term_type)`,
`pgrdf.get_term(id)`,
`pgrdf.put_quad(s, p, o, g)`,
`pgrdf.count_quads(g)`,
`pgrdf.add_graph(g)` β all backed by SPI against the
`_pgrdf_dictionary` + `_pgrdf_quads` schema declared in
`sql/schema_v0_2_0.sql`.
- 7 `#[pg_test]` integration tests + 3 regression files.
- Justfile: `just test` runs `cargo pgrx test` inside the linux
builder container; `just test-regression` runs pg_regress-style
SQL fixtures against the compose Postgres. Both gate the same
thing CI will.
### Phase 1 β Scaffold + runtime
- pgrx 0.16 extension scaffolding (PG 14-17 feature matrix,
`pgrx_embed` bin target for schema generation).
- Compose-based local runtime: stock `postgres:17.4-bookworm` with
per-file bind mounts at `$libdir` / `$sharedir/extension`. No init
script, no entrypoint wrapper.
- Linux builder container (`compose/builder.Containerfile`) that
produces glibc-bookworm artifacts on macOS hosts. Two-VM topology:
Colima for builds (100 GB), podman for the compose stack (avoids
filling the user's other container state).
- 10-doc engineering set under `docs/` (architecture, storage, query,
inference, validation, install, dev, testing, release, roadmap).
- `specs/SPEC.pgRDF.LLD.v0.2.md` + `specs/SPEC.pgRDF.INSTALL.v0.2.md`
captured verbatim alongside `specs/ERRATA.v0.2.md` cataloguing
deltas found during implementation.
- CI / release workflow placeholders for the
{pg14..pg17}Γ{amd64, arm64} matrix.
### Errata against v0.2 specs
- `shacl-rust` β `shacl_validation` (E-001).
- `reasonable` is OWL 2 RL only, not arbitrary Datalog (E-002).
- PG 18 forward path blocked on pgrx 0.17/0.18 not building on
current Rust (E-006). Compose targets PG 17 until upstream lands
a fix.
- See [`specs/ERRATA.v0.2.md`](specs/ERRATA.v0.2.md) for the full set.