pgrdf 0.3.0

Rust-native PostgreSQL extension for RDF, SPARQL, SHACL and OWL reasoning
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730
731
732
733
734
735
736
737
738
739
740
741
742
743
744
745
746
747
748
749
750
751
752
753
754
755
756
757
758
759
760
761
762
763
764
765
766
767
768
769
770
771
772
773
774
775
776
777
778
779
780
781
782
783
784
785
786
787
788
789
790
791
792
793
794
795
796
797
798
799
800
801
802
803
804
805
806
807
808
809
810
811
812
813
814
815
816
817
818
819
820
821
822
823
824
825
826
827
828
829
830
831
832
833
834
835
836
837
838
839
840
841
842
843
844
845
846
847
848
# 03 — Querying with SPARQL

`pgrdf.sparql(q TEXT) → SETOF JSONB` runs a SPARQL SELECT against
everything in the database and returns one JSON row per solution.

```sql
SELECT * FROM pgrdf.sparql(
  'PREFIX foaf: <http://xmlns.com/foaf/0.1/>
   SELECT ?person ?name WHERE { ?person foaf:name ?name }'
);
--  → {"person": "http://example.com/alice", "name": "Alice"}
--  → {"person": "http://example.com/bob",   "name": "Bob"}
```

Each row is a `JSONB` object keyed by the SELECT-clause variable
names. Lexical values come back as strings. The `pgrdf.sparql`
function is set-returning, so you can use it anywhere a normal
SETOF Postgres function would go — `FROM`, `LATERAL`, CTEs, etc.

## What works today

| Form | Status |
|---|---|
| `SELECT ?vars WHERE { BGP }` with 1 or more triple patterns ||
| Constants in subject, predicate, or object position (IRIs, literals) ||
| Multi-pattern BGPs with shared variables → INNER joins ||
| `DISTINCT`, `REDUCED``SELECT DISTINCT` ||
| `LIMIT N`, `OFFSET N` ||
| `ORDER BY ?var`, `ORDER BY ASC(?var)`, `ORDER BY DESC(?var)` — lexicographic on `lexical_value` ||
| `ORDER BY <complex expression>` | ⏳ v0.4 |
| `FILTER` — identity (`=`, `!=`, `sameTerm`), boolean (`&&`, `\|\|`, `!`), term-type (`isIRI`, `isLiteral`, `isBlank`), `BOUND` ||
| `FILTER` — numeric ordering (`<`/`>`/`<=`/`>=`), `REGEX`, `IN`, `STR` passthrough ||
| `FILTER` — arithmetic (`+`/`-`/`*`/`/`), `LANG`, `DATATYPE`, `STRLEN`, `UCASE`, `LCASE`, `CONTAINS`, `STRSTARTS`, `STRENDS` ||
| `OPTIONAL { single-triple BGP }` → LEFT JOIN (with inner FILTER honoured) ||
| `OPTIONAL { multi-pattern BGP }`, nested OPTIONALs | ⏳ v0.4 |
| `UNION` (n-way, branches may bind different vars) ||
| `MINUS { multi-pattern }` keyed by shared vars (no-op when no shared vars per spec) ||
| Aggregates — `COUNT(*)`, `COUNT(?v)`, `COUNT(DISTINCT ?v)`, `SUM`, `AVG`, type-aware `MIN`/`MAX`, `GROUP_CONCAT`, `SAMPLE` with `GROUP BY` ||
| `HAVING(?alias > c)` (after AS-alias) **and** `HAVING(SUM(?v) > c)` (inline aggregate) ||
| `BIND(expr AS ?v)` for projection (Literal / NamedNode / Variable, STR / LANG / DATATYPE / UCASE / LCASE / STRLEN, arithmetic, CONCAT) ||
| `ASK { … }` query form ||
| `CONSTRUCT`, `DESCRIBE` | ⏳ v0.4 |
| Property paths beyond simple sequence (`*`, `+`, `?`, `^`, `\|`) | ⏳ v0.4 |
| `VALUES (?x) { … }` inline data | ⏳ v0.4 |
| Named-graph `GRAPH { … }` clauses | ⏳ v0.4 |
| Aggregates over `UNION` | ⏳ v0.4 |
| BIND output referenced in later FILTER / BGP | ⏳ v0.4 |
| `SERVICE` (federated SPARQL) | Out of scope for v0.x |

`pgrdf.sparql_parse(q)` reports the parsed shape as JSONB and flags
`unsupported_algebra` for everything not yet translated — use it to
preview whether the translator will handle your query (see further down).

## Examples

### Single-pattern BGP

```sql
-- Every triple in the database
SELECT * FROM pgrdf.sparql('SELECT ?s ?p ?o WHERE { ?s ?p ?o }');

-- All FOAF names
SELECT * FROM pgrdf.sparql(
  'PREFIX foaf: <http://xmlns.com/foaf/0.1/>
   SELECT ?name WHERE { ?_ foaf:name ?name }'
);

-- What does this specific subject have?
SELECT * FROM pgrdf.sparql(
  'SELECT ?p ?o WHERE { <http://example.com/alice> ?p ?o }'
);
```

### Multi-pattern BGP — shared variables become joins

```sql
-- People who have BOTH name and mbox
SELECT * FROM pgrdf.sparql(
  'PREFIX foaf: <http://xmlns.com/foaf/0.1/>
   SELECT ?p ?n ?m
     WHERE { ?p foaf:name ?n .
             ?p foaf:mbox ?m }'
);
--  → {"p": "http://example.com/alice", "n": "Alice", "m": "mailto:a@x"}
--  → {"p": "http://example.com/carol", "n": "Carol", "m": "mailto:c@x"}
--  (Bob excluded — no mbox.)

-- Three-pattern chain: "name of A, name of someone A knows"
SELECT * FROM pgrdf.sparql(
  'PREFIX foaf: <http://xmlns.com/foaf/0.1/>
   SELECT ?an ?bn
     WHERE { ?a foaf:knows ?b .
             ?a foaf:name  ?an .
             ?b foaf:name  ?bn }'
);
--  → {"an": "Alice", "bn": "Bob"}
```

### Constants in any position

```sql
-- Bound predicate
SELECT * FROM pgrdf.sparql(
  'PREFIX foaf: <http://xmlns.com/foaf/0.1/>
   SELECT ?s WHERE { ?s a foaf:Person }'
);

-- Bound subject
SELECT * FROM pgrdf.sparql(
  'SELECT ?p ?o WHERE { <http://example.com/alice> ?p ?o }'
);

-- Bound literal object — exact value + datatype match
SELECT * FROM pgrdf.sparql(
  'PREFIX foaf: <http://xmlns.com/foaf/0.1/>
   SELECT ?p WHERE { ?p foaf:name "Alice" }'
);

-- Typed literal
SELECT * FROM pgrdf.sparql(
  'PREFIX xsd: <http://www.w3.org/2001/XMLSchema#>
   PREFIX ex:  <http://example.com/>
   SELECT ?p WHERE { ?p ex:age "30"^^xsd:integer }'
);
```

### FILTER expressions

```sql
-- Identity: literal equality (compared as dict ids — sameTerm semantics)
SELECT * FROM pgrdf.sparql(
  'PREFIX foaf: <http://xmlns.com/foaf/0.1/>
   SELECT ?s WHERE { ?s foaf:name ?n FILTER(?n = "Alice") }'
);

-- Identity: IRI equality (also against ?vars)
SELECT * FROM pgrdf.sparql(
  'PREFIX foaf: <http://xmlns.com/foaf/0.1/>
   SELECT ?s ?o
     WHERE { ?s ?p ?o FILTER(?p = foaf:knows) }'
);

-- Negation
SELECT * FROM pgrdf.sparql(
  'PREFIX foaf: <http://xmlns.com/foaf/0.1/>
   SELECT ?s WHERE { ?s foaf:name ?n FILTER(?n != "Alice") }'
);

-- Term-type predicates
SELECT * FROM pgrdf.sparql(
  'SELECT ?s ?o WHERE { ?s ?p ?o FILTER(isIRI(?o)) }'
);
SELECT * FROM pgrdf.sparql(
  'SELECT ?s ?o WHERE { ?s ?p ?o FILTER(isLiteral(?o)) }'
);
SELECT * FROM pgrdf.sparql(
  'SELECT ?s WHERE { ?s ?p ?o FILTER(isBlank(?s)) }'
);

-- Boolean composition
SELECT * FROM pgrdf.sparql(
  'PREFIX foaf: <http://xmlns.com/foaf/0.1/>
   SELECT ?s ?o
     WHERE { ?s ?p ?o FILTER(isIRI(?o) && ?p = foaf:knows) }'
);

-- Self-loop detection via ?s = ?o
SELECT * FROM pgrdf.sparql('SELECT ?s WHERE { ?s ?p ?o FILTER(?s = ?o) }');
```

#### What `=` actually means here

pgRDF's FILTER `=` is implemented by comparing **dictionary ids**.
Two terms compare equal iff their `(term_type, lexical, datatype,
language)` quadruple matches exactly — that's RDF `sameTerm`
semantics, which is also what SPARQL's `=` reduces to for IRIs and
blank nodes, and matches `=` for strings of the same datatype.

The XSD-value-equality cases (`"1"^^xsd:integer = "01"^^xsd:integer`,
`"a" = "a"^^xsd:string`) currently compare as *not equal* because
the lexical forms differ — a single-term-equality is by dict-id,
which preserves datatype + language. Use the numeric ordering
operators (`<`/`>`/`<=`/`>=`) for value-aware numeric comparison
on `xsd:numeric` literals.

#### `BOUND` in a BGP context

`BOUND(?v)` is trivially `TRUE` for any variable `?v` that's used in
the mandatory BGP (every mandatory BGP variable is bound on every
result row) and `FALSE` for any variable that isn't. It earns its
keep against `OPTIONAL`-introduced variables — `BOUND(?v)` translates
to `qN.col IS NOT NULL`, which correctly returns FALSE for OPTIONAL
vars that didn't match (see the OPTIONAL section below).

#### Combining FILTER with multi-pattern BGPs

```sql
-- All people with both name + mbox, excluding Alice
SELECT * FROM pgrdf.sparql(
  'PREFIX foaf: <http://xmlns.com/foaf/0.1/>
   SELECT ?p ?n ?m
     WHERE { ?p foaf:name ?n .
             ?p foaf:mbox ?m
             FILTER(?n != "Alice") }'
);
```

Filters apply after the BGP joins — they're appended to the
`WHERE` clause of the generated SQL.

### Numeric ordering

```sql
-- Adults only
SELECT * FROM pgrdf.sparql(
  'PREFIX foaf: <http://xmlns.com/foaf/0.1/>
   SELECT ?s ?age
     WHERE { ?s foaf:age ?age FILTER(?age >= 18) }'
);

-- Age range
SELECT * FROM pgrdf.sparql(
  'PREFIX foaf: <http://xmlns.com/foaf/0.1/>
   SELECT ?s
     WHERE { ?s foaf:age ?age FILTER(?age >= 30 && ?age < 65) }'
);
```

Both sides are cast to Postgres `NUMERIC` if and only if their
dictionary entry's datatype is one of the XSD numeric IRIs
(`xsd:integer`, `xsd:decimal`, `xsd:double`, `xsd:float`, the
sized variants and unsigned variants, and the constraint subtypes).
Anything else — `xsd:string`, untyped, IRI, blank node — compares
NULL and is dropped from the result, matching SPARQL's "type
error → unbound" semantics. Comparing two strings as if they were
numbers does not raise an error; it just yields no rows.

If you need string ordering (lexicographic), post-process in SQL —
the SPARQL surface only does numeric `<`/`>` on `xsd:numeric`
typed literals; for strings, sort outside the SPARQL UDF:

```sql
SELECT j ->> 's' AS s, j ->> 'n' AS n
  FROM pgrdf.sparql(
    'PREFIX foaf: <http://xmlns.com/foaf/0.1/>
     SELECT ?s ?n WHERE { ?s foaf:name ?n }'
  ) AS j
 ORDER BY j ->> 'n';
```

### REGEX

```sql
-- Case-sensitive (Postgres ~ operator)
SELECT * FROM pgrdf.sparql(
  'PREFIX foaf: <http://xmlns.com/foaf/0.1/>
   SELECT ?s WHERE { ?s foaf:name ?n FILTER(REGEX(?n, "^A")) }'
);

-- Case-insensitive (i flag → Postgres ~* operator)
SELECT * FROM pgrdf.sparql(
  'PREFIX foaf: <http://xmlns.com/foaf/0.1/>
   SELECT ?s WHERE { ?s foaf:name ?n FILTER(REGEX(?n, "^a", "i")) }'
);

-- STR() wrapper is a no-op (every term's lexical form IS its string)
SELECT * FROM pgrdf.sparql(
  'PREFIX foaf: <http://xmlns.com/foaf/0.1/>
   SELECT ?s WHERE { ?s foaf:name ?n FILTER(REGEX(STR(?n), "ar", "i")) }'
);
```

The regex pattern is a SPARQL literal at translation time and is
embedded as a Postgres regex literal (single quotes are escaped).
Anchors (`^`, `$`), character classes, quantifiers — anything
Postgres POSIX regex supports. The `i` flag toggles case-insensitive;
other flags are accepted but currently ignored (Postgres POSIX
doesn't have a direct PCRE-flag equivalent for `x`/`m`/`s`).

### IN — set membership

```sql
-- Find persons in a named set
SELECT * FROM pgrdf.sparql(
  'PREFIX foaf: <http://xmlns.com/foaf/0.1/>
   SELECT ?s WHERE {
     ?s foaf:name ?n
     FILTER(?s IN (<http://example.com/alice>,
                   <http://example.com/carol>,
                   <http://example.com/dave>))
   }'
);

-- Literal membership
SELECT * FROM pgrdf.sparql(
  'PREFIX foaf: <http://xmlns.com/foaf/0.1/>
   SELECT ?s WHERE { ?s foaf:name ?n FILTER(?n IN ("Alice", "Bob")) }'
);
```

`IN` is dict-id set membership — emits `qN.col IN (id_1, id_2, …)`
where each id is resolved upfront. Unknown terms resolve to `-1`
so they can never match, matching SPARQL's "not in the set" outcome.

### OPTIONAL

`OPTIONAL { ?s :p ?o }` translates to a `LEFT JOIN` against the
mandatory BGP. Variables introduced inside the OPTIONAL come back
NULL (as `JSON null` in the JSONB output) for rows where the
optional pattern didn't match.

```sql
-- Names + mbox if available
SELECT * FROM pgrdf.sparql(
  'PREFIX foaf: <http://xmlns.com/foaf/0.1/>
   SELECT ?s ?n ?m
     WHERE { ?s foaf:name ?n
             OPTIONAL { ?s foaf:mbox ?m } }'
);
--  → {"s": "...alice", "n": "Alice", "m": "mailto:a@x"}
--  → {"s": "...bob",   "n": "Bob",   "m": null}
--  → {"s": "...carol", "n": "Carol", "m": "mailto:c@x"}
```

#### OPTIONAL with an inner FILTER

```sql
-- Bring back age only if >= 18; otherwise the row still surfaces
-- with ?a = null (filter rejects the optional match, not the row)
SELECT * FROM pgrdf.sparql(
  'PREFIX foaf: <http://xmlns.com/foaf/0.1/>
   SELECT ?s ?n ?a
     WHERE { ?s foaf:name ?n
             OPTIONAL { ?s foaf:age ?a FILTER(?a >= 18) } }'
);
```

The OPTIONAL's filter lands in the LEFT JOIN's `ON` clause, so when
it rejects a candidate match, `?a` comes back as `null` (rather
than the whole row being pruned).

#### Multiple chained OPTIONALs

```sql
-- name (mandatory), mbox + age both OPTIONAL
SELECT * FROM pgrdf.sparql(
  'PREFIX foaf: <http://xmlns.com/foaf/0.1/>
   SELECT ?s ?n ?m ?a
     WHERE { ?s foaf:name ?n
             OPTIONAL { ?s foaf:mbox ?m }
             OPTIONAL { ?s foaf:age  ?a } }'
);
```

Each OPTIONAL becomes its own LEFT JOIN. Variables introduced in
one OPTIONAL aren't visible to another OPTIONAL's join condition
(per SPARQL semantics).

#### Pruning with outer FILTER(BOUND(?v))

```sql
-- Persons who DO have an mbox — outer FILTER removes the unbound rows
SELECT * FROM pgrdf.sparql(
  'PREFIX foaf: <http://xmlns.com/foaf/0.1/>
   SELECT ?s ?m
     WHERE { ?s foaf:name ?n
             OPTIONAL { ?s foaf:mbox ?m }
             FILTER(BOUND(?m)) }'
);
```

`BOUND(?v)` translates to `qN.col IS NOT NULL`, so it correctly
returns FALSE for OPTIONAL vars that didn't match. (For mandatory
vars it's always TRUE since INNER joins guarantee non-null.)

#### Today's restrictions

- **Each OPTIONAL block must hold exactly one triple pattern.**
  Multi-pattern OPTIONALs require a derived-table refactor that
  lands in the next slice. The executor panics with a clear
  message if you give it `OPTIONAL { a . b . }`.
- **Nested OPTIONAL inside OPTIONAL** isn't supported yet — only
  flat chains at the same level.
- **OPTIONAL's inner FILTER** sees only that OPTIONAL's variables
  and the mandatory anchors, not other OPTIONAL groups' variables.

### UNION

`{ A } UNION { B }` combines two branches with SQL `UNION ALL`.
Each branch is a complete sub-SELECT — its own BGP, FILTERs, and
OPTIONALs. Variables only bound in one branch come back as
`null` in the JSONB rows from the other branch.

```sql
-- Same projected var across branches (names from either property)
SELECT * FROM pgrdf.sparql(
  'PREFIX foaf: <http://xmlns.com/foaf/0.1/>
   SELECT ?s ?n
     WHERE { { ?s foaf:name ?n }
             UNION
             { ?s foaf:nick ?n } }'
);

-- Different vars per branch
SELECT * FROM pgrdf.sparql(
  'PREFIX foaf: <http://xmlns.com/foaf/0.1/>
   SELECT ?s ?n ?m
     WHERE { { ?s foaf:name ?n }
             UNION
             { ?s foaf:mbox ?m } }'
);
--  → {"s": "...alice", "n": "Alice", "m": null}
--  → {"s": "...bob",   "n": null,    "m": "mailto:b@x"}

-- N-way chain: A UNION B UNION C flattens to 3 branches
SELECT * FROM pgrdf.sparql(
  'PREFIX foaf: <http://xmlns.com/foaf/0.1/>
   SELECT ?s ?o
     WHERE { { ?s foaf:name ?o }
             UNION
             { ?s foaf:nick ?o }
             UNION
             { ?s foaf:mbox ?o } }'
);
```

#### How UNION composes with the rest

- **FILTER inside a branch** is branch-local — it only prunes that
  branch's rows.
- **OPTIONAL inside a branch** works the same as in a non-UNION
  query, scoped to that branch.
- **DISTINCT / ORDER BY / LIMIT / OFFSET** apply to the union
  result as a whole. ORDER BY on UNION may only reference
  **projected** variables (the outer SELECT can't see a branch's
  internal alias columns); the executor panics with a clear
  message if you try.
- Each branch is translated independently with its own `q1, q2, …`
  alias namespace — there's no cross-branch join.

#### Today's restriction

- Each UNION branch is one of: BGP, FILTERed BGP, BGP with
  OPTIONALs. Nested UNION inside a branch, or UNION inside an
  OPTIONAL, isn't supported in this slice.

### MINUS

`{ A } MINUS { B }` removes rows of `A` whose shared variables are
compatible with some row of `B`. The translator emits a
`WHERE NOT EXISTS (SELECT 1 FROM pgrdf._pgrdf_quads qMIN WHERE …)`
sub-SELECT keyed on those shared variables.

```sql
-- Persons who DON'T have an mbox
SELECT * FROM pgrdf.sparql(
  'PREFIX foaf: <http://xmlns.com/foaf/0.1/>
   SELECT ?s ?n
     WHERE { ?s foaf:name ?n
             MINUS { ?s foaf:mbox ?m } }'
);

-- Persons with neither mbox nor age (chained MINUSes)
SELECT * FROM pgrdf.sparql(
  'PREFIX foaf: <http://xmlns.com/foaf/0.1/>
   SELECT ?s
     WHERE { ?s foaf:name ?n
             MINUS { ?s foaf:mbox ?m }
             MINUS { ?s foaf:age  ?a } }'
);
```

#### The shared-variables rule

Per SPARQL spec, MINUS only filters when the two arms share at
least one variable. If `MINUS { ?x ex:foo ?y }` shares no variable
with the outer query, it's a no-op — every row of the outer
pattern survives. The translator detects this case at translation
time and emits no SQL at all for that MINUS block.

That's different from how OPTIONAL behaves with disjoint variables
(OPTIONAL does emit a LEFT JOIN regardless). The asymmetry is
inherited from the SPARQL semantics: MINUS without shared vars
is defined to be the identity; OPTIONAL without shared vars is a
cross product.

#### Today's restrictions

- **Nested MINUS inside MINUS** isn't supported — only flat chains.

(Multi-triple MINUS sub-patterns are supported, keyed on shared
variables with the outer query — see the surface table at the top.)

### Aggregates and GROUP BY

`pgrdf.sparql` supports the SPARQL set functions `COUNT` (with or
without `DISTINCT`), `SUM`, `AVG`, `MIN`, `MAX`, optionally with
`GROUP BY`. Each aggregate is bound to a SPARQL variable via the
`(EXPR AS ?var)` syntax in the SELECT clause.

```sql
-- Total triples in the database
SELECT * FROM pgrdf.sparql(
  'SELECT (COUNT(*) AS ?n) WHERE { ?s ?p ?o }'
);
--  → {"n": "9"}

-- Distinct subjects
SELECT * FROM pgrdf.sparql(
  'SELECT (COUNT(DISTINCT ?s) AS ?subjects) WHERE { ?s ?p ?o }'
);

-- Sum / Avg over numeric values (non-numeric literals are skipped)
SELECT * FROM pgrdf.sparql(
  'PREFIX foaf: <http://xmlns.com/foaf/0.1/>
   SELECT (SUM(?age) AS ?total) (AVG(?age) AS ?mean)
     WHERE { ?s foaf:age ?age }'
);

-- Type-aware MIN/MAX: numeric path on xsd:numeric, lex fallback otherwise
SELECT * FROM pgrdf.sparql(
  'PREFIX foaf: <http://xmlns.com/foaf/0.1/>
   SELECT (MIN(?n) AS ?lo) (MAX(?n) AS ?hi)
     WHERE { ?s foaf:name ?n }'
);

-- GROUP BY: count of triples per predicate
SELECT * FROM pgrdf.sparql(
  'SELECT ?p (COUNT(?o) AS ?n)
     WHERE { ?s ?p ?o }
   GROUP BY ?p'
);
--  → {"p": "http://xmlns.com/foaf/0.1/name", "n": "4"}
--  → {"p": "http://xmlns.com/foaf/0.1/age",  "n": "3"}
--  → {"p": "http://xmlns.com/foaf/0.1/mbox", "n": "2"}

-- GROUP BY + ORDER BY on the aggregate, then LIMIT
SELECT * FROM pgrdf.sparql(
  'SELECT ?p (COUNT(?o) AS ?n)
     WHERE { ?s ?p ?o }
   GROUP BY ?p
   ORDER BY DESC(?n) LIMIT 1'
);
```

#### How values come back

All aggregate values are emitted as JSON **strings** in the row's
JSONB output, consistent with the rest of `pgrdf.sparql`. For
numeric results, parse them on the caller side
(`CAST(j ->> 'total' AS NUMERIC)` in SQL, `int(row.sparql["n"])`
in Python, etc.).

#### SUM / AVG numeric awareness

`SUM(?v)` and `AVG(?v)` cast `?v` to `NUMERIC` if and only if
its dictionary entry's datatype is one of the XSD numeric IRIs
(`xsd:integer`, `xsd:decimal`, `xsd:double`, `xsd:float`, plus
the sized + unsigned + constraint subtypes). Non-numeric values
contribute `NULL` and are ignored by the aggregate per SQL
semantics — no Postgres cast error is raised. This matches the
FILTER ordering semantics.

If your data mixes string-encoded numbers (`"30"^^xsd:string`)
with proper numeric literals, only the latter contribute. Re-load
with explicit XSD datatype annotations to fix this in the
fixture rather than working around it in the query.

#### MIN / MAX — type-aware

`MIN(?v)` and `MAX(?v)` are type-aware: when `?v` resolves to an
`xsd:numeric` literal (any of the XSD numeric IRIs, including the
sized + unsigned + constraint subtypes) the aggregate runs on the
`NUMERIC` cast, so `MAX("10", "2") = "10"`. Non-numeric values
contribute NULL on the numeric path; the implementation falls back
to lexicographic `MIN`/`MAX` on the term's string form when the
numeric path yields no rows. For string-typed literals and IRIs
the lex fallback is the intuitive answer.

#### `HAVING`, `GROUP_CONCAT`, `SAMPLE`, `BIND`

`HAVING` ships in both forms: the AS-alias form
(`SELECT (COUNT(?o) AS ?n) … GROUP BY ?p HAVING(?n > 5)`) and the
inline-aggregate form (`HAVING(SUM(?v) > 100)`). `GROUP_CONCAT(?v
[; SEPARATOR = "…"])` lowers to Postgres `string_agg`; `SAMPLE(?v)`
uses `MIN(...)` as a deterministic surrogate. `BIND(expr AS ?v)`
is supported for projection — Literal / NamedNode / Variable,
`STR` / `LANG` / `DATATYPE` / `UCASE` / `LCASE` / `STRLEN`,
arithmetic, `CONCAT`.

#### Today's restrictions

- **Aggregates on top of UNION** aren't supported. Aggregates over
  a UNION result require a derived-table refactor that lands in
  a later slice (v0.4).
- **Filtering on a BIND output** (referencing `?v` from
  `BIND(expr AS ?v)` in a later FILTER or BGP) isn't supported yet
  — queued for v0.4.

### Solution modifiers — DISTINCT / LIMIT / OFFSET / ORDER BY

The four classic SPARQL modifiers all land in the generated SQL:

```sql
-- DISTINCT — dedup on the projected variables
SELECT * FROM pgrdf.sparql(
  'PREFIX foaf: <http://xmlns.com/foaf/0.1/>
   SELECT DISTINCT ?n WHERE { ?s foaf:name ?n }'
);

-- REDUCED — treated as DISTINCT (safe over-approximation per spec)
SELECT * FROM pgrdf.sparql(
  'PREFIX foaf: <http://xmlns.com/foaf/0.1/>
   SELECT REDUCED ?n WHERE { ?s foaf:name ?n }'
);

-- LIMIT — cap the number of returned rows
SELECT * FROM pgrdf.sparql(
  'SELECT ?s ?o WHERE { ?s ?p ?o } LIMIT 10'
);

-- OFFSET — skip rows from the start
SELECT * FROM pgrdf.sparql(
  'SELECT ?s ?o WHERE { ?s ?p ?o } OFFSET 10 LIMIT 10'
);

-- ORDER BY ?var — ascending lexicographic on lexical_value
SELECT * FROM pgrdf.sparql(
  'PREFIX foaf: <http://xmlns.com/foaf/0.1/>
   SELECT ?n WHERE { ?s foaf:name ?n } ORDER BY ?n'
);

-- ORDER BY DESC(?var)
SELECT * FROM pgrdf.sparql(
  'PREFIX foaf: <http://xmlns.com/foaf/0.1/>
   SELECT ?n WHERE { ?s foaf:name ?n } ORDER BY DESC(?n)'
);

-- ORDER BY ASC(?var), DESC(?other) — multiple sort keys
SELECT * FROM pgrdf.sparql(
  'PREFIX foaf: <http://xmlns.com/foaf/0.1/>
   SELECT ?s ?n
     WHERE { ?s foaf:name ?n }
   ORDER BY ASC(?n) DESC(?s)'
);
```

#### How ORDER BY works under the hood

For each `ORDER BY ?var`, the translator emits

```sql
ORDER BY (SELECT lexical_value FROM pgrdf._pgrdf_dictionary
            WHERE id = qN.<col>) [ASC|DESC] NULLS LAST
```

If `?var` is in the SELECT list, the existing projected column is
reused (no extra subselect). If `?var` is bound in the BGP but
NOT projected, an extra hidden column is appended to the SELECT
list and ORDER BY references it by ordinal position. The
`execute` layer only emits the projected columns into JSONB, so
those hidden columns are invisible to callers.

This is **lexicographic order on the term's string form**, not
SPARQL's full type-aware ordering. For string-typed literals and
IRIs that's the same answer; for numeric literals it sorts as
strings (`"10"` < `"2"`), which is wrong. Use numeric FILTER plus
a Postgres `ORDER BY (sparql->>'n')::numeric` wrapping the
`pgrdf.sparql` call when you need numeric ordering today. Full
type-aware `ORDER BY ?n` over `xsd:numeric` literals lands in
v0.4. (Note: aggregate `MIN`/`MAX` already use the type-aware
path — see the aggregates section above.)

#### DISTINCT + ORDER BY interaction

If `ORDER BY` references a variable that's NOT in the SELECT list,
DISTINCT can't be applied — Postgres requires ORDER BY expressions
to appear in the select list when DISTINCT is used. pgRDF panics
with a clear message in that case rather than silently dropping
DISTINCT or the ORDER BY. Pull the variable into the SELECT clause
or remove DISTINCT.

### Combining with regular SQL

`pgrdf.sparql` is a SETOF function, so you can join its results with
relational tables, filter them with WHERE, aggregate them, anything:

```sql
-- Find FOAF persons whose name matches a regex
SELECT j->>'p' AS person
  FROM pgrdf.sparql(
    'PREFIX foaf: <http://xmlns.com/foaf/0.1/>
     SELECT ?p ?n WHERE { ?p foaf:name ?n }'
  ) AS j
 WHERE j->>'n' ~* '^a';
--  → http://example.com/alice

-- Join SPARQL output to your relational data
WITH foaf AS (
  SELECT j->>'p' AS person_iri, j->>'n' AS name
    FROM pgrdf.sparql(
      'PREFIX foaf: <http://xmlns.com/foaf/0.1/>
       SELECT ?p ?n WHERE { ?p foaf:name ?n }'
    ) AS j
)
SELECT customers.email, foaf.name
  FROM customers
  JOIN foaf ON customers.uri = foaf.person_iri;
```

## Inspecting queries before running them

`pgrdf.sparql_parse(q) → JSONB` returns the parsed shape without
executing. Use it when you want to know whether the translator can
handle a query, or to extract structure for code that builds queries:

```sql
SELECT pgrdf.sparql_parse(
  'PREFIX foaf: <http://xmlns.com/foaf/0.1/>
   SELECT ?s ?n WHERE { ?s foaf:name ?n }'
);
-- {
--   "form": "SELECT",
--   "variables": ["s", "n"],
--   "bgp_pattern_count": 1,
--   "bgp_patterns": [
--     {"s": {"var": "s"},
--      "p": {"iri": "http://xmlns.com/foaf/0.1/name"},
--      "o": {"var": "n"}}
--   ],
--   "unsupported_algebra": []
-- }
```

If your query uses OPTIONAL / aggregates / property paths / etc.,
`unsupported_algebra` lists what the translator can't yet handle.
The query itself parses fine (spargebra is feature-complete) —
`pgrdf.sparql` just won't execute those forms yet:

```sql
SELECT pgrdf.sparql_parse(
  'SELECT ?s ?n WHERE { ?s ?p ?o OPTIONAL { ?s <http://x/n> ?n } }'
);
--  → {…, "unsupported_algebra": ["LeftJoin (OPTIONAL)"]}
```

The FILTER surface is broad — identity, boolean, term-type,
`BOUND`, numeric ordering, `REGEX`, `IN`, `STR`, `LANG`,
`DATATYPE`, `UCASE`, `LCASE`, `STRLEN`, `CONTAINS`, `STRSTARTS`,
`STRENDS`, and arithmetic — but if the executor encounters a
shape it doesn't yet translate, it errors with a clear message
rather than
silently dropping the predicate.

## How the translation works

For the curious / debugging — the translator generates one
`_pgrdf_quads` alias per BGP pattern, joins shared variables via
equality predicates, and resolves constants to dictionary ids
*before* building the dynamic SQL. Worked example for

```sparql
PREFIX foaf: <http://xmlns.com/foaf/0.1/>
SELECT ?p ?n ?m
  WHERE { ?p foaf:name ?n .
          ?p foaf:mbox ?m }
```

becomes approximately

```sql
SELECT
  (SELECT lexical_value FROM pgrdf._pgrdf_dictionary WHERE id = q1.subject_id) AS "p",
  (SELECT lexical_value FROM pgrdf._pgrdf_dictionary WHERE id = q1.object_id)  AS "n",
  (SELECT lexical_value FROM pgrdf._pgrdf_dictionary WHERE id = q2.object_id)  AS "m"
FROM pgrdf._pgrdf_quads q1,
     pgrdf._pgrdf_quads q2
WHERE q1.predicate_id = 200    -- foaf:name's dict id
  AND q2.predicate_id = 201    -- foaf:mbox's dict id
  AND q2.subject_id   = q1.subject_id;   -- shared ?p anchor
```

Predicate / subject / object indexes on `_pgrdf_quads` (SPO, POS, OSP
covering indexes per the hexastore design) make those equality
lookups index-only scans. Dict resolution for the projected
variables uses a scalar subquery so any missing term ids come back
as NULL rather than dropping the row.

### Unknown terms are NULL, not error

If a constant in the query (predicate IRI, literal value, etc.) isn't
in the dictionary, the translator inlines `-1` as the dict id, which
matches no quad row → the query returns zero results. This is the
correct SPARQL semantics ("no solutions exist") rather than an
error condition:

```sql
SELECT count(*) FROM pgrdf.sparql(
  'SELECT ?s ?o WHERE { ?s <http://nope.example/never-loaded> ?o }'
);
--  → 0
```

## Performance posture (today)

| Cost | Where it shows up |
|---|---|
| 1× SPI lookup per **constant** in the BGP | At translation time, before the dynamic SQL runs. |
| Dynamic SQL via SPI executes against the partitioned hexastore | One PostgreSQL plan + execute per `pgrdf.sparql` call. |
| Dict round-trip for each projected variable in each output row | Scalar subquery on `_pgrdf_dictionary` (index-only scan on PK). |

For typical "100s of rows out" queries this is sub-millisecond on
local data. For "millions of rows out" the dict round-trips become
the dominant cost — a future optimisation is to hash-join the
dictionary upfront instead of per-row scalar subqueries; tracked
as a v0.4 candidate.

The Postgres prepared-statement cache (LLD §4.2) **shipped in
Phase 3 step 2**: dict-id constants in the dynamic SQL are now
`$N` parameters and a per-backend
`thread_local!<RefCell<HashMap<String, OwnedPreparedStatement>>>`
keeps the prepared plan around — so repeated `pgrdf.sparql`
calls with the same BGP shape (including parametric variations
on IRI / literal constants) reuse the same SPI plan. Counters
live in `pgrdf.stats()` (`plan_cache_hits` / `misses` /
`inserts` / `local_size`). The cross-backend shmem dict cache
from LLD §4.1 also lives in `pgrdf.stats()`
(`shmem_hits` / `misses` / `inserts` / `evictions`).

## Limits / gotchas

- **Blank nodes in queries are rejected.** SPARQL semantics treat
  `?b` and `_:b` as variables of different scoping rules; pgRDF
  refuses blank-node terms in patterns to keep semantics unambiguous.
- **RDF-star quoted triples** are out of scope (LLD §2).
- **Cross-graph queries**: today every `pgrdf.sparql` call searches
  ALL graphs. Per-graph scoping (`GRAPH <g> { … }` and the dataset
  clause) lands in v0.4 — see the deferral list above.
- **No SPARQL 1.2** anything yet — base SPARQL 1.1 only.

## Next

- [clients/python.md]clients/python.md — calling `pgrdf.sparql`
  from Python.
- [clients/rust.md]clients/rust.md — same from Rust.
- The engineering side: [`docs/03-query.md`]../docs/03-query.md
  for the translator's algebra walk, the prepared-plan cache, and
  the v0.4 deferred-surface notes.