pg_tviews 0.1.0-beta.12

Transactional materialized views with incremental refresh for PostgreSQL
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
# Performance Benchmarks Overview

Comprehensive benchmarking methodology and test scenarios for pg_tviews performance validation.

**Version**: 0.1.0-beta.1 • **Last Updated**: December 12, 2025

## Quick Links

- **[Running Benchmarks]running-benchmarks.md** - Step-by-step guide to execute the benchmark suite
- **[Docker Setup]docker-benchmarks.md** - Containerized benchmark environment (Advanced)
- **[Results Interpretation]results-interpretation.md** - Understanding benchmark results
- **[JSONB IVM Integration]jsonb-ivm-integration.md** - Smart patching performance

## Prerequisites

### System Requirements
- **PostgreSQL**: 13-18 (all versions supported)
- **Rust**: 1.70+ (for building extensions)
- **Disk Space**: 5GB+ for benchmark data
- **Memory**: 4GB+ recommended

### Extension Dependencies
- **pg_tviews**: Core extension (built from source)
- **jsonb_delta**: Optional performance extension (built from source)
- **pg_ivm**: Alternative incremental view extension (optional)

### Results Status
- **✅ REAL MEASUREMENTS**: Small & Medium scale benchmarks (1K-100K products)
- **⚠️ PROJECTIONS**: Large scale performance (1M+ products) and real jsonb_delta benefits

## Overview

pg_tviews includes a comprehensive benchmark suite that validates performance claims through real-world testing scenarios. The benchmarks measure incremental refresh performance against traditional materialized view approaches.

**Results Status**: The benchmark suite provides both real measurements (small/medium scale) and projections (large scale, real extensions). See [Results Interpretation](results-interpretation.md) for details on what is measured vs projected.

## Benchmark Suite Architecture

### Test Scenarios

The benchmark suite includes three scale levels:

#### Small Scale (Development)
- **Data Size**: 1K products, 5K reviews
- **Use Case**: Development and testing environments
- **Runtime**: ~2 minutes
- **Purpose**: Quick validation of core functionality

#### Medium Scale (Production)
- **Data Size**: 100K products, 500K reviews
- **Use Case**: Production applications
- **Runtime**: ~15 minutes
- **Purpose**: Real-world performance validation

#### Large Scale (Enterprise)
- **Data Size**: 1M products, 5M reviews
- **Use Case**: Enterprise applications
- **Runtime**: ~1 hour
- **Purpose**: Scalability validation

### Comparison Approaches

Each scenario tests four approaches:

#### 1. pg_tviews + jsonb_delta (Recommended)
- **Description**: Automatic triggers with optimized JSONB patching
- **Performance**: Maximum performance (baseline)
- **Use Case**: Production applications requiring maximum performance

#### 2. pg_tviews + Native PG
- **Description**: Automatic triggers with standard jsonb_set operations
- **Performance**: 98% of maximum performance
- **Use Case**: Applications without jsonb_delta extension

#### 3. Manual Function Refresh
- **Description**: Explicit function calls with full cascade support
- **Performance**: 95% of maximum performance
- **Use Case**: Applications needing full control over refresh timing

#### 4. Full REFRESH MATERIALIZED VIEW (Baseline)
- **Description**: Traditional PostgreSQL materialized view refresh
- **Performance**: 0.01-0.02% of incremental performance
- **Use Case**: Performance baseline for comparison

## Key Performance Results

| Scale | Operation | Incremental (ms) | Full Refresh (ms) | Improvement |
|-------|-----------|------------------|-------------------|-------------|
| 1K products | Single update | 0.6-1.5 | 75.8 | 50-128× |
| 100K products | Single update | 1.5-2.1 | 4,170 | 1,979-2,853× |
| 1M products* | Single update | ~2-3 | ~42,000 | ~14,000× |

*Projected based on linear scaling

## Test Schema

### Core Entities

```
tb_product (products catalog)
├── pk_product (BIGINT PRIMARY KEY)
├── id (UUID)
├── name, description, price_current, etc.
└── fk_category, fk_supplier (cascade relationships)

tb_category (product categories)
├── pk_category (BIGINT PRIMARY KEY)
├── id (UUID)
├── name, description
└── parent relationship (self-referential)

tb_supplier (product suppliers)
├── pk_supplier (BIGINT PRIMARY KEY)
├── id (UUID)
├── name, contact_info
└── location data

tb_review (product reviews)
├── pk_review (BIGINT PRIMARY KEY)
├── id (UUID)
├── rating, comment, created_at
├── fk_product (references tb_product)
└── fk_user (references tb_user)

tb_inventory (stock levels)
├── pk_inventory (BIGINT PRIMARY KEY)
├── fk_product (references tb_product)
├── quantity_available, reorder_point
└── warehouse_location
```

### TVIEW Definitions

#### tv_product (Main Product View)
```sql
CREATE TABLE tv_product AS
SELECT
    p.pk_product,
    p.id,
    p.fk_category,
    p.fk_supplier,
    c.id as category_id,
    s.id as supplier_id,
    jsonb_build_object(
        'id', p.id,
        'name', p.name,
        'description', p.description,
        'price', jsonb_build_object(
            'current', p.price_current,
            'original', p.price_original
        ),
        'category', jsonb_build_object(
            'id', c.id,
            'name', c.name
        ),
        'supplier', jsonb_build_object(
            'id', s.id,
            'name', s.name
        ),
        'inventory', jsonb_build_object(
            'quantity', i.quantity_available,
            'status', CASE
                WHEN i.quantity_available > i.reorder_point THEN 'in_stock'
                WHEN i.quantity_available > 0 THEN 'low_stock'
                ELSE 'out_of_stock'
            END
        ),
        'reviews', COALESCE(
            jsonb_agg(
                jsonb_build_object(
                    'id', r.id,
                    'rating', r.rating,
                    'comment', r.comment,
                    'user', jsonb_build_object('id', u.id, 'name', u.name)
                )
            ) FILTER (WHERE r.id IS NOT NULL),
            '[]'::jsonb
        ),
        'avg_rating', COALESCE(AVG(r.rating) FILTER (WHERE r.rating IS NOT NULL), 0)
    ) as data
FROM tb_product p
LEFT JOIN tb_category c ON p.fk_category = c.pk_category
LEFT JOIN tb_supplier s ON p.fk_supplier = s.pk_supplier
LEFT JOIN tb_inventory i ON p.pk_product = i.fk_product
LEFT JOIN tb_review r ON p.pk_product = r.fk_product
LEFT JOIN tb_user u ON r.fk_user = u.pk_user
GROUP BY p.pk_product, p.id, p.name, p.description, p.price_current,
         p.price_original, p.fk_category, p.fk_supplier,
         c.id, c.name, s.id, s.name,
         i.quantity_available, i.reorder_point;
```

#### tv_category (Category View)
```sql
CREATE TABLE tv_category AS
SELECT
    c.pk_category,
    c.id,
    c.parent_id,
    jsonb_build_object(
        'id', c.id,
        'name', c.name,
        'description', c.description,
        'parent', CASE
            WHEN pc.id IS NOT NULL THEN
                jsonb_build_object('id', pc.id, 'name', pc.name)
            ELSE NULL
        END,
        'product_count', COUNT(p.pk_product),
        'subcategories', COALESCE(
            jsonb_agg(
                jsonb_build_object('id', sc.id, 'name', sc.name)
            ) FILTER (WHERE sc.id IS NOT NULL),
            '[]'::jsonb
        )
    ) as data
FROM tb_category c
LEFT JOIN tb_category pc ON c.fk_parent = pc.pk_category
LEFT JOIN tb_category sc ON sc.fk_parent = c.pk_category
LEFT JOIN tb_product p ON p.fk_category = c.pk_category
GROUP BY c.pk_category, c.id, c.name, c.description, c.fk_parent,
         pc.id, pc.name;
```

## Test Operations

### Single Entity Updates

#### Product Price Change
```sql
-- Update single product price
UPDATE tb_product
SET price_current = price_current * 1.1
WHERE pk_product = ?;
```

**Expected Impact**:
- Updates 1 product in tv_product
- No cascade effects (price changes don't affect other entities)

#### Category Name Change
```sql
-- Update category name
UPDATE tb_category
SET name = ?
WHERE pk_category = ?;
```

**Expected Impact**:
- Updates 1 category in tv_category
- Cascades to all products in that category (tv_product updates)

#### Supplier Contact Change
```sql
-- Update supplier contact info
UPDATE tb_supplier
SET contact_email = ?
WHERE pk_supplier = ?;
```

**Expected Impact**:
- Updates 1 supplier in tv_supplier (if exists)
- Cascades to all products from that supplier (tv_product updates)

### Bulk Operations

#### Category-Wide Price Update
```sql
-- Update prices for entire category
UPDATE tb_product
SET price_current = price_current * 0.9
WHERE fk_category = ?;
```

**Expected Impact**:
- Updates N products in tv_product
- No cascade effects

#### Bulk Inventory Update
```sql
-- Update inventory for multiple products
UPDATE tb_inventory
SET quantity_available = quantity_available + ?
WHERE fk_product IN (...);
```

**Expected Impact**:
- Updates N inventory records
- Cascades to N products in tv_product

### Cascade Scenarios

#### Deep Category Hierarchy Change
```sql
-- Update top-level category
UPDATE tb_category
SET name = ?
WHERE pk_category = ?;
```

**Expected Impact**:
- Updates 1 category
- Cascades to subcategories
- Cascades to all products in category tree

#### Supplier Change with Reviews
```sql
-- Change product supplier
UPDATE tb_product
SET fk_supplier = ?
WHERE pk_product = ?;
```

**Expected Impact**:
- Updates 1 product
- Updates supplier info in product view
- No effect on reviews (supplier change doesn't affect review data)

## Measurement Methodology

### Timing Precision

- **Clock Source**: PostgreSQL `clock_timestamp()` (microsecond precision)
- **Isolation**: Each test runs in separate transaction, rolled back for repeatability
- **Warm-up**: Initial operations discarded to avoid cold-start effects
- **Iterations**: Multiple runs with statistical analysis (mean, std dev, percentiles)

### Performance Metrics

#### Latency Metrics
- **Single Operation**: Time for individual INSERT/UPDATE/DELETE
- **Bulk Operation**: Time for multi-row operations
- **Cascade Completion**: Time for all dependent updates to complete

#### Throughput Metrics
- **Operations/Second**: Sustained throughput under load
- **Queue Processing**: Time to process refresh queues
- **Memory Usage**: Peak memory consumption during operations

#### Efficiency Metrics
- **Cache Hit Rates**: Graph cache, table cache, query plan cache
- **Queue Deduplication**: Effectiveness of entity+PK deduplication
- **Bulk Optimization**: Queries per operation ratio

### Statistical Analysis

- **Sample Size**: Minimum 30 iterations per test
- **Confidence Intervals**: 95% confidence for all reported metrics
- **Outlier Handling**: Automatic detection and exclusion of outliers
- **Comparative Analysis**: Statistical significance testing between approaches

## Benchmark Execution

### Quick Test (Small Scale)
```bash
cd test/sql/comprehensive_benchmarks
./run_benchmarks.sh --scale small
```

### Production Test (Medium Scale)
```bash
cd test/sql/comprehensive_benchmarks
./run_benchmarks.sh --scale medium
```

### Enterprise Test (Large Scale)
```bash
cd test/sql/comprehensive_benchmarks
./run_benchmarks.sh --scale large
```

### Custom Test
```bash
# Run specific test scenarios
./run_benchmarks.sh --scenarios "single_update,bulk_update" --scale medium

# Generate detailed report
python3 generate_report.py --format detailed
```

## Result Interpretation

### Performance Expectations

#### Small Scale (Development)
- **Incremental Approaches**: 100-200× faster than full refresh
- **Single Operations**: Sub-millisecond response times
- **Bulk Operations**: Millisecond-scale response times

#### Medium Scale (Production)
- **Incremental Approaches**: 5,000-12,000× faster than full refresh
- **Single Operations**: Sub-millisecond response times
- **Bulk Operations**: Low millisecond response times
- **Cascade Operations**: Consistent performance regardless of cascade depth

#### Large Scale (Enterprise)
- **Incremental Approaches**: 35,000-70,000× faster than full refresh (projected)
- **Linear Scaling**: Performance scales linearly with change volume
- **Memory Efficiency**: Constant memory usage vs O(n) for full refresh

### Comparative Analysis

#### Approach Performance Hierarchy
1. **pg_tviews + jsonb_delta**: Maximum performance (1.0x baseline)
2. **pg_tviews + Native**: 98% of maximum performance
3. **Manual Function**: 95% of maximum performance with full control
4. **Full Refresh**: 0.01-0.02% of incremental performance

#### Use Case Recommendations

- **Maximum Performance**: Use pg_tviews + jsonb_delta
- **Full Control**: Use Manual Function approach
- **Compatibility**: Use pg_tviews + Native (no jsonb_delta dependency)
- **Baseline**: Full Refresh for comparison only

## Validation Criteria

### Performance Validation
- [ ] Incremental approaches achieve 100×+ improvement (small scale)
- [ ] Incremental approaches achieve 5,000×+ improvement (medium scale)
- [ ] Linear scaling maintained across dataset sizes
- [ ] Cascade performance independent of dependency depth

### Functional Validation
- [ ] All approaches produce identical results
- [ ] Transactional consistency maintained
- [ ] No data loss or corruption during operations
- [ ] Proper handling of concurrent operations

### Reliability Validation
- [ ] No crashes or panics during testing
- [ ] Memory usage remains bounded
- [ ] Queue processing completes successfully
- [ ] Error handling works correctly

## See Also

- [Benchmark Results]results.md - Detailed performance data
- [FraiseQL Integration]../getting-started/fraiseql-integration.md - Real-world usage patterns
- [Performance Tuning]../operations/performance-tuning.md - Optimization strategies