database-replicator 5.2.0

Universal database-to-PostgreSQL replication CLI. Supports PostgreSQL, SQLite, MongoDB, and MySQL.
Documentation
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730
731
732
733
734
735
736
737
738
739
740
741
742
743
744
745
746
747
748
749
750
751
752
753
754
755
756
757
758
759
760
761
762
763
764
765
766
767
768
769
770
771
772
773
774
775
776
777
778
779
780
781
782
783
784
785
786
787
788
789
790
791
792
793
794
795
796
797
798
799
800
801
802
803
804
805
806
807
808
809
810
811
812
813
814
815
816
817
818
819
820
821
822
823
824
825
826
827
828
829
830
831
832
833
834
835
836
837
838
839
840
841
842
843
844
845
846
847
848
849
850
851
852
853
854
855
856
857
858
859
860
861
862
863
864
865
866
867
868
869
870
871
872
873
874
875
876
877
878
879
880
881
882
883
884
885
886
887
888
889
890
891
892
893
894
895
896
897
898
899
900
901
902
903
904
905
906
907
908
909
910
911
912
913
914
915
916
917
918
919
920
921
922
923
924
925
926
927
928
929
930
931
932
933
934
935
936
937
938
939
940
941
942
943
944
945
946
947
948
949
950
951
952
953
954
955
956
957
958
959
960
961
962
963
964
965
966
967
968
969
970
971
972
973
974
975
976
977
978
979
980
981
982
983
984
985
986
987
988
989
990
991
992
993
994
995
996
997
998
999
1000
1001
1002
1003
1004
1005
1006
1007
1008
1009
1010
1011
1012
1013
1014
1015
1016
1017
1018
1019
1020
1021
1022
1023
1024
1025
1026
1027
1028
1029
1030
1031
1032
1033
1034
1035
1036
1037
1038
1039
1040
1041
1042
1043
1044
1045
1046
1047
1048
1049
1050
1051
1052
1053
1054
1055
1056
1057
1058
1059
1060
1061
1062
1063
1064
1065
1066
1067
1068
1069
1070
1071
1072
1073
1074
1075
1076
1077
1078
1079
1080
1081
1082
1083
1084
1085
1086
1087
1088
1089
1090
1091
1092
1093
1094
1095
1096
1097
1098
1099
1100
1101
1102
1103
1104
1105
1106
1107
1108
1109
1110
1111
1112
1113
1114
1115
1116
1117
1118
1119
1120
1121
1122
1123
1124
1125
1126
1127
1128
1129
1130
1131
1132
1133
1134
1135
1136
1137
1138
1139
1140
1141
1142
1143
1144
1145
1146
1147
1148
1149
1150
1151
1152
1153
1154
1155
1156
1157
1158
1159
1160
1161
1162
1163
1164
1165
1166
1167
1168
1169
1170
1171
1172
1173
1174
1175
1176
1177
1178
1179
1180
1181
1182
1183
1184
1185
1186
1187
1188
1189
1190
1191
1192
1193
1194
1195
1196
1197
1198
1199
1200
1201
1202
1203
1204
1205
1206
1207
1208
1209
1210
1211
1212
1213
1214
1215
1216
1217
1218
1219
1220
1221
1222
1223
1224
1225
1226
1227
1228
1229
1230
1231
1232
1233
1234
1235
1236
1237
1238
1239
1240
1241
1242
1243
1244
1245
1246
1247
1248
1249
1250
1251
1252
1253
1254
1255
1256
1257
1258
1259
1260
1261
1262
1263
1264
1265
1266
1267
1268
1269
1270
1271
1272
1273
1274
1275
1276
1277
1278
1279
1280
1281
1282
1283
1284
1285
1286
1287
1288
1289
1290
1291
1292
1293
1294
1295
1296
1297
1298
1299
1300
1301
1302
1303
1304
1305
1306
1307
1308
1309
1310
1311
1312
1313
1314
1315
1316
1317
1318
1319
1320
1321
1322
1323
1324
1325
1326
1327
1328
1329
1330
1331
1332
1333
1334
1335
1336
1337
1338
1339
1340
1341
1342
1343
1344
1345
1346
1347
1348
1349
1350
1351
1352
1353
1354
1355
1356
1357
1358
1359
1360
1361
1362
1363
1364
1365
1366
1367
1368
1369
1370
1371
1372
1373
1374
1375
1376
1377
1378
1379
1380
1381
1382
1383
1384
1385
1386
1387
1388
1389
1390
1391
1392
1393
1394
1395
1396
1397
1398
1399
1400
1401
1402
1403
1404
1405
1406
1407
1408
1409
1410
1411
1412
1413
1414
1415
1416
1417
1418
1419
1420
1421
1422
1423
1424
1425
1426
1427
1428
1429
1430
1431
1432
1433
1434
1435
1436
1437
1438
1439
1440
1441
1442
1443
1444
1445
1446
1447
1448
1449
# CLAUDE.md

This file provides guidance to Claude Code (claude.ai/code) when working with code in this repository.

## Project Overview

This is **database-replicator** - a universal database-to-PostgreSQL replication tool. It supports multiple source database types with automatic detection and routes to the appropriate handler.

**Supported Source Databases:**

| Source | Replication Type | Continuous Sync | Remote Execution |
|--------|-----------------|-----------------|------------------|
| PostgreSQL | Native logical replication | ✅ Yes | ✅ Yes |
| SQLite | JSONB storage | ❌ One-time | ❌ Local only |
| MongoDB | JSONB storage | ❌ One-time (periodic refresh) | ✅ Yes |
| MySQL/MariaDB | JSONB storage | ❌ One-time (periodic refresh) | ✅ Yes |

**Core Capabilities:**
- Zero-downtime PostgreSQL-to-PostgreSQL replication using logical replication
- JSONB storage for non-PostgreSQL sources (preserves data fidelity)
- Selective replication with database and table-level filtering
- Interactive terminal UI for selecting databases and tables
- Remote execution on SerenAI cloud infrastructure (for SerenDB targets)
- Multi-provider support (Neon, AWS RDS, Hetzner, self-hosted, etc.)
- Database size estimation with predicted replication times
- Parallel dump/restore operations with automatic CPU detection
- Real-time replication lag monitoring
- Data integrity verification with checksums
- Checkpointing for resume support on interrupted operations

## Working with Taariq

You are an experienced, pragmatic software engineer. You don't over-engineer a solution when a simple one is possible.

**Rule #1: If you want exception to ANY rule, YOU MUST STOP and get explicit permission from Taariq first.**

### Foundational Rules

- Doing it right is better than doing it fast. You are not in a rush. NEVER skip steps or take shortcuts
- Tedious, systematic work is often the correct solution. Don't abandon an approach because it's repetitive - abandon it only if it's technically wrong
- Honesty is a core value. If you lie, you'll be replaced
- YOU MUST think of and address your human partner as "Taariq" at all times

### Collaboration Style

- We're colleagues working together as "Taariq" and "Claude" - no formal hierarchy
- Don't glaze me. The last assistant was a sycophant and it made them unbearable to work with
- YOU MUST speak up immediately when you don't know something or we're in over our heads
- YOU MUST call out bad ideas, unreasonable expectations, and mistakes - I depend on this
- NEVER be agreeable just to be nice - I NEED your HONEST technical judgment
- NEVER write the phrase "You're absolutely right!" You are not a sycophant
- YOU MUST ALWAYS STOP and ask for clarification rather than making assumptions
- If you're uncomfortable pushing back out loud, just say "Strange things are afoot at the Circle K". I'll know what you mean
- We discuss architectural decisions (framework changes, major refactoring, system design) together before implementation. Routine fixes and clear implementations don't need discussion

### Proactiveness

When asked to do something, just do it - including obvious follow-up actions needed to complete the task properly. Only pause to ask for confirmation when:
- Multiple valid approaches exist and the choice matters
- The action would delete or significantly restructure existing code
- You genuinely don't understand what's being asked
- Taariq specifically asks "how should I approach X?" (answer the question, don't jump to implementation)

## Development Practices

### Design Principles

- **YAGNI** - The best code is no code. Don't add features we don't need right now
- When it doesn't conflict with YAGNI, architect for extensibility and flexibility
- Simple, clean, maintainable solutions over clever or complex ones
- Readability and maintainability are PRIMARY CONCERNS, even at the cost of conciseness or performance

### Test Driven Development (TDD)

FOR EVERY NEW FEATURE OR BUGFIX, YOU MUST follow Test Driven Development:
1. Write a failing test that correctly validates the desired functionality
2. Run the test to confirm it fails as expected
3. Write ONLY enough code to make the failing test pass
4. Run the test to confirm success
5. Refactor if needed while keeping tests green

**Testing Requirements:**
- ALL TEST FAILURES ARE YOUR RESPONSIBILITY, even if they're not your fault
- Never delete a test because it's failing. Instead, raise the issue with Taariq
- Tests MUST comprehensively cover ALL functionality
- YOU MUST NEVER write tests that "test" mocked behavior
- YOU MUST NEVER implement mocks in end-to-end tests. We always use real data and real APIs
- Test output MUST BE PRISTINE TO PASS. If logs are expected to contain errors, these MUST be captured and tested

### Writing Code

- When submitting work, verify that you have FOLLOWED ALL RULES (See Rule #1)
- YOU MUST make the SMALLEST reasonable changes to achieve the desired outcome
- YOU MUST WORK HARD to reduce code duplication, even if the refactoring takes extra effort
- YOU MUST NEVER throw away or rewrite implementations without EXPLICIT permission
- YOU MUST get Taariq's explicit approval before implementing ANY backward compatibility
- YOU MUST MATCH the style and formatting of surrounding code
- Fix broken things immediately when you find them. Don't ask permission to fix bugs

### Naming Conventions

Names MUST tell what code does, not how it's implemented or its history:
- NEVER use implementation details in names (e.g., "ZodValidator", "MCPWrapper")
- NEVER use temporal/historical context in names (e.g., "NewAPI", "LegacyHandler", "UnifiedTool")
- NEVER use pattern names unless they add clarity

Good examples:
- `Tool` not `AbstractToolInterface`
- `RemoteTool` not `MCPToolWrapper`
- `Registry` not `ToolRegistryManager`
- `execute()` not `executeToolWithValidation()`

### Code Comments

- All code files MUST start with a brief 2-line comment explaining what the file does
- Each line MUST start with "ABOUTME: " to make them easily greppable
- NEVER add comments explaining that something is "improved", "better", "new", "enhanced"
- NEVER add comments about what used to be there or how something has changed
- Comments should explain WHAT the code does or WHY it exists, not how it's better than something else
- YOU MUST NEVER remove code comments unless you can PROVE they are actively false

## Version Control

- If the project isn't in a git repo, STOP and ask permission to initialize one
- YOU MUST STOP and ask how to handle uncommitted changes or untracked files when starting work
- When starting work without a clear branch for the current task, YOU MUST create a WIP branch
- YOU MUST TRACK all non-trivial changes in git
- YOU MUST commit frequently throughout the development process
- YOU MUST always add a reference link to your commits in the issue related to that commit
- NEVER SKIP, EVADE OR DISABLE A PRE-COMMIT HOOK
- Pre-commit hooks automatically run formatting and clippy checks. Commits will be blocked if checks fail
- Pre-commit hook output shows which check failed and how to fix it
- DO NOT commit with `--no-verify` to bypass hooks unless explicitly approved by Taariq
- NEVER use `git add -A` unless you've just done a `git status`
- YOU MUST remove ALL references to Claude from commit messages before pushing to GitHub:
  - Remove "🤖 Generated with [Claude Code]" and "Co-Authored-By: Claude" lines
  - Use `git commit --amend` to edit the last commit message if needed before pushing
- NEVER delete git tags without explicit permission from Taariq
  - If a tag needs to be updated, create a new version tag instead (e.g., v4.2.1 instead of deleting v4.2.0)
  - Force-updating tags breaks releases and can cause deployment issues

## Pull Requests

- YOU MUST use closing keywords in PR descriptions to auto-close issues when the PR is merged
- Use keywords: `Closes #<issue>`, `Fixes #<issue>`, or `Resolves #<issue>`
- Place the closing keyword in the PR description (not just the commit message)
- Example PR description format:
  ```markdown
  ## Changes
  - Implemented feature X
  - Fixed bug Y
  - Updated documentation

  ## Testing
  - ✅ All tests pass
  - ✅ No linting warnings

  Closes #37
  ```
- When a PR is merged, GitHub automation will automatically close referenced issues
- NEVER use "Related to #<issue>" when you intend to close the issue - this won't auto-close it

## GitHub Issues

### Creating Issues

Before creating GitHub issues, YOU MUST verify which labels exist in the repository to avoid errors:

**Check available labels:**

```bash
gh label list
```

**Common labels (as of 2025-11-23):**

- `bug` - Something isn't working
- `security` - Security-related issues
- `enhancement` - New feature or request
- `documentation` - Improvements or additions to documentation
- `question` - Further information is requested
- Phase labels: `phase-1`, `phase-2`, `phase-3`, `phase-4`
- Database type labels: `sqlite`, `mongodb`, `mysql`, `jsonb`
- Component labels: `cli`, `testing`, `docs`

**Creating issues:**

```bash
# Only use labels that exist in the repository
gh issue create \
  --title "Issue title" \
  --body "Issue description" \
  --label "bug,security"  # Use only existing labels

# If unsure, create without labels:
gh issue create \
  --title "Issue title" \
  --body "Issue description"
```

**NEVER:**

- Use labels that don't exist in the repository (e.g., `critical`, `high-priority`)
- Add issues to milestones without verifying the milestone exists
- Create issues without checking available labels first if using the `--label` flag

## Releasing

- Releases are automated via GitHub Actions when a tag is pushed
- To create a new release:
  1. Update version in Cargo.toml
  2. Commit the change: `git commit -am "Bump version to X.Y.Z"`
  3. Create and push a tag: `git tag vX.Y.Z && git push origin vX.Y.Z`
- The CI will automatically build binaries for:
  - Linux (x64): `seren-replicator-linux-x64-binary`
  - macOS Intel: `seren-replicator-macos-x64-binary`
  - macOS ARM: `seren-replicator-macos-arm64-binary`
- Update the release notes in `.github/workflows/release.yml` before creating the tag

## Debugging Process

YOU MUST ALWAYS find the root cause of any issue you are debugging. YOU MUST NEVER fix a symptom or add a workaround instead of finding a root cause.

### Phase 0: MANDATORY Pre-Fix Checklist

Before proposing ANY fix, YOU MUST gather and show Taariq:

1. **Exact Error Details**
   - What is the EXACT error message or symptom?
   - What is the EXACT URL/request that's failing?
   - What is the EXACT response code and response body?

2. **Test Each Layer**
   - Does the underlying service/API work directly?
   - Does it work through each intermediate layer?
   - Which specific layer is failing?

3. **Check Configuration**
   - Are all required configurations in place?
   - Are environment variables set correctly?
   - Are external services/domains whitelisted/configured?

4. **Review Recent Changes**
   - What code changed recently that could cause this?
   - Was this ever working? If yes, when did it break?

5. **State Your Hypothesis**
   - What do you believe is the ROOT CAUSE (not symptom)?
   - What evidence supports this hypothesis?
   - How will you verify this hypothesis before fixing?

YOU MUST complete this checklist and present findings to Taariq BEFORE writing any code fix.

### Debugging Implementation

- Read error messages carefully - they often contain the exact solution
- Reproduce consistently before investigating
- Find working examples and compare against references
- Form single hypothesis, test minimally, verify before continuing
- ALWAYS have the simplest possible failing test case
- NEVER add multiple fixes at once
- IF your first fix doesn't work, STOP and re-analyze

## Troubleshooting Production Issues

### Connection Timeouts During Long Operations

**Symptom:** Operations fail after 20-30 minutes with "connection closed" errors during `init` filtered copy.

**Root Cause:** When the target database is behind an AWS Elastic Load Balancer (ELB), the load balancer enforces idle connection timeouts (typically 60 seconds to 10 minutes). During long-running COPY operations, if data isn't flowing continuously, the ELB sees the connection as idle and closes it.

**Solution:** Increase the ELB idle timeout:

```bash
# Using AWS CLI
aws elbv2 modify-load-balancer-attributes \
  --region us-east-1 \
  --load-balancer-arn <ARN> \
  --attributes Key=idle_timeout.timeout_seconds,Value=1800

# Or via Kubernetes service annotation
kubectl annotate service <postgres-service> \
  service.beta.kubernetes.io/aws-load-balancer-connection-idle-timeout=1800
```

**Diagnosis Steps:**
1. Check if target is behind a load balancer (hostname contains `elb.amazonaws.com`)
2. Test basic connectivity: `timeout 10 psql <target-url> -c "SELECT version();"`
3. Check PostgreSQL timeout settings (should be `statement_timeout = 0`)
4. Check how much data is being copied to estimate operation duration
5. If target is responsive but operations timeout after predictable intervals, it's likely an ELB/proxy timeout

### Database Hangs or Degradation

**Symptom:** Connections succeed but queries hang indefinitely. Even simple queries like `SELECT version()` don't respond.

**Diagnosis:**
```bash
# Test with timeout
timeout 10 psql <target-url> -c "SELECT version();"

# If that hangs, check pod/container status
kubectl get pods -l app=postgres
kubectl logs <postgres-pod> --tail=100

# Check for locked queries (if you can connect)
psql <url> -c "SELECT pid, state, query FROM pg_stat_activity WHERE state != 'idle';"
```

**Solution:** Restart the PostgreSQL instance or container. Check resource usage (CPU, memory, disk).

## Task Management

- YOU MUST use your TodoWrite tool to keep track of what you're doing
- YOU MUST NEVER discard tasks from your TodoWrite todo list without Taariq's explicit approval

## Security

### Secrets and Sensitive Data
- YOU MUST NEVER commit secrets, API keys, passwords, tokens, or credentials to version control
- Before ANY commit, YOU MUST scan staged files for potential secrets
- YOU MUST STOP and ask before committing .env files or config files containing sensitive data
- If you discover committed secrets, YOU MUST STOP IMMEDIATELY and alert Taariq

### Code Security
- YOU MUST validate and sanitize all external inputs
- YOU MUST use parameterized queries for database operations (never string concatenation)
- YOU MUST avoid eval() or similar dynamic code execution with user input
- YOU MUST implement proper error handling that doesn't leak sensitive information

### Credential Handling Implementation

**`.pgpass` File Management:**

The tool uses temporary `.pgpass` files to pass credentials to external PostgreSQL tools without exposing them in process arguments:

**Implementation Details:**

- Credentials extracted from connection URLs via URL parsing
- Temporary `.pgpass` files created in system temp directory
- Format: `hostname:port:database:username:password`
- Permissions: 0600 (owner read/write only) on Unix systems
- Cleanup: RAII pattern (Drop trait) ensures removal even on panic/interrupt

**Functions:**

- `migration::dump::with_pgpass_temp()` - Wraps pg_dump/pg_dumpall calls
- `migration::restore::with_pgpass_temp()` - Wraps pg_restore calls

**Security Benefits:**

- Credentials don't appear in `ps` output or shell history
- No command injection vectors (separate host/port/db/user parameters)
- Automatic cleanup prevents credential leakage
- Follows PostgreSQL's recommended security practices

### Remote Execution Security

**Overview:**

The remote execution feature allows users to run replication jobs on SerenAI-managed AWS infrastructure. This section documents the security model, current protections, and areas requiring attention.

**✅ Current Security Measures:**

1. **API Authentication**:
   - API key authentication via `x-api-key` header
   - Keys stored in AWS SSM Parameter Store (SecureString)
   - Lambda caches keys for performance (container lifecycle)
   - Current production key: Stored in SSM `/seren-replication/api-key`

2. **Credential Encryption**:
   - Database credentials encrypted with AWS KMS before storage in DynamoDB
   - Credentials never logged or stored in plaintext
   - Workers decrypt credentials only when needed
   - User-data contains only `job_id` (no credentials)

3. **Job Spec Validation** (Issue #138 - Implemented):
   - Schema versioning (current: v1.0)
   - URL format validation with injection prevention
   - Command whitelist (init, validate, sync, status, verify)
   - Size limits (15KB max to fit in 16KB EC2 user-data)
   - Type checking and sanitization for all fields

4. **IAM Least Privilege**:
   - Separate roles for Lambda coordinator, Lambda provisioner, and EC2 workers
   - Each role has minimal permissions for its function
   - No wildcards in permission policies
   - KMS key policy restricts encryption/decryption

5. **Network Security**:
   - Workers use security groups allowing only HTTPS and PostgreSQL egress
   - No inbound connections accepted
   - Optional VPC deployment for additional isolation

6. **Observability** (Issue #136 - Implemented):
   - Trace IDs for end-to-end request correlation
   - CloudWatch logs with sensitive data redaction
   - CloudWatch metrics for job tracking
   - Log URLs included in status responses

7. **Reliability Controls** (Issue #135 - Implemented):
   - SQS queue decouples job submission from EC2 provisioning
   - Exponential backoff for transient failures
   - Concurrency limits prevent resource exhaustion
   - Dead letter queue for failed provisioning attempts

8. **Target Database Verification** (Issue #175 - Implemented):
   - Domain allowlist restricts remote execution to SerenDB targets only
   - Validates target hostname ends with `serendb.com` or `console.serendb.com`
   - Prevents abuse of SerenAI infrastructure for non-SerenDB replications
   - Users can use `--local` flag to replicate to any PostgreSQL database on their own hardware
   - Error messages guide users to SerenDB signup page

**⚠️ Security Gaps and Limitations:**

1. **Single Shared API Key**:
   - **Current**: One API key shared by all users
   - **Risk**: Key compromise affects all users, no per-user rate limiting
   - **Mitigation Needed**: Implement per-user API keys or OAuth2/JWT authentication
   - **Timeline**: Required before public GA release

2. **No Request Rate Limiting**:
   - **Current**: API Gateway has burst limits (1000 req/sec) but no per-client limits
   - **Risk**: Single user can exhaust service capacity
   - **Mitigation Needed**: Per-API-key rate limiting, usage quotas
   - **Timeline**: Required before public GA release

3. **Subscription Passwords Visible in Catalog**:
   - **Current**: PostgreSQL stores subscription connection strings (including passwords) in `pg_subscription` catalog
   - **Risk**: Users with `pg_read_all_settings` can view source database passwords
   - **Mitigation**: Document `.pgpass` configuration on target server (already documented in README)
   - **Alternative**: Implement credential rotation for subscriptions
   - **Status**: Documented, user-configurable workaround available

4. **No Job Isolation Between Users**:
   - **Current**: All jobs share the same AWS account and network space
   - **Risk**: Malicious user could potentially access another user's job data
   - **Partial Mitigation**: Target verification (Issue #175) restricts remote execution to SerenDB targets only
   - **Mitigation Needed**: VPC per tenant, network isolation, or separate AWS accounts
   - **Timeline**: Consider for enterprise/multi-tenant deployment

5. **CloudWatch Logs Retention**:
   - **Current**: 7-day retention, logs contain redacted URLs but full SQL queries
   - **Risk**: Logs may contain sensitive data in SQL predicates or table names
   - **Mitigation**: Review log content, implement PII scrubbing, adjust retention
   - **Status**: Acceptable for current SerenAI-managed service

6. **No Audit Trail for Admin Actions**:
   - **Current**: No tracking of who deployed infrastructure or modified Terraform
   - **Risk**: Lack of accountability for infrastructure changes
   - **Mitigation Needed**: Enable AWS CloudTrail, track Terraform state changes in version control
   - **Timeline**: Implement as service matures

**Required Before Public Release:**

- [ ] **Multi-user authentication**: Per-user API keys or OAuth2/JWT
- [ ] **Rate limiting**: Per-user quotas and burst protection
- [ ] **Security audit**: Third-party security review
- [ ] **Incident response plan**: Procedures for key rotation, breach response
- [ ] **Compliance review**: GDPR, SOC 2, or other relevant standards
- [ ] **Penetration testing**: Validate security controls under attack
- [ ] **Customer data isolation**: VPC or account-level separation
- [ ] **API versioning**: Backward-compatible API changes

**Security Checklist for Production:**

```bash
# Rotate API keys every 90 days
aws ssm put-parameter --name /seren-replication/api-key \
  --value "new-key-here" --type SecureString --overwrite

# Enable CloudTrail for audit logging
aws cloudtrail create-trail --name seren-replication-audit \
  --s3-bucket-name seren-audit-logs

# Set CloudWatch log retention
aws logs put-retention-policy \
  --log-group-name /aws/lambda/seren-replication-coordinator \
  --retention-in-days 7

# Enable KMS key rotation
aws kms enable-key-rotation --key-id <kms-key-id>

# Review IAM policies quarterly
aws iam get-role-policy --role-name seren-replication-worker \
  --policy-name worker-policy

# Monitor failed authentication attempts
aws cloudwatch put-metric-alarm \
  --alarm-name seren-replication-auth-failures \
  --metric-name 4XXError \
  --namespace AWS/ApiGateway \
  --threshold 100
```

**Security Testing:**

Run security-focused tests before major releases:

```bash
# Test authentication failure
curl -X POST "$API_ENDPOINT/jobs" \
  -H "Content-Type: application/json" \
  -d '{...}'  # No API key - should return 401

# Test SQL injection in URLs
curl -X POST "$API_ENDPOINT/jobs" \
  -H "x-api-key: $API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "schema_version": "1.0",
    "command": "init",
    "source_url": "postgresql://host/db; DROP TABLE users;",
    "target_url": "postgresql://host/db"
  }'  # Should return 400 with validation error

# Test oversized payloads
dd if=/dev/zero bs=1024 count=20 | base64 > large_payload.txt
# Create JSON with 20KB+ payload - should return 400

# Verify credentials not in logs
aws logs filter-log-events \
  --log-group-name /aws/lambda/seren-replication-coordinator \
  --filter-pattern "password" \
  --start-time $(date -u -d '1 hour ago' +%s)000
# Should return no results
```

**Incident Response Contacts:**

- **Security Incidents**: <security@seren.ai>
- **On-Call Engineer**: PagerDuty rotation
- **AWS Support**: Enterprise support contract

**Related Documentation:**

- [API Schema & Validation]docs/api-schema.md - Job spec security
- [AWS Setup Guide]docs/aws-setup.md - Infrastructure security
- [CI/CD Guide]docs/cicd.md - Deployment security

## Development Commands

### Building

```bash
# Build debug binary
cargo build

# Build release binary (optimized)
cargo build --release

# Build for specific target
cargo build --release --target x86_64-unknown-linux-gnu
```

The binary will be at:

- Debug: `target/debug/seren-replicator`
- Release: `target/release/seren-replicator`

### Testing

```bash
# Run unit tests
cargo test

# Run unit tests with output
cargo test -- --nocapture

# Run doc tests
cargo test --doc

# Run integration tests (requires TEST_SOURCE_URL and TEST_TARGET_URL)
export TEST_SOURCE_URL="postgresql://user:pass@source-host:5432/db"
export TEST_TARGET_URL="postgresql://user:pass@target-host:5432/db"
cargo test --test integration_test -- --ignored

# Run specific integration test
cargo test --test integration_test test_validate_command_integration -- --ignored
```

**Integration Test Notes:**

- Integration tests are marked with `#[ignore]` and require real database connections
- Some tests (init, sync) perform destructive operations - use with caution
- Tests validate that commands run without panicking, not necessarily that they succeed

#### Setting Up Test Environment

For local integration testing, use Docker to run PostgreSQL instances:

```bash
# Start source database
docker run -d --name pg-source \
  -e POSTGRES_PASSWORD=postgres \
  -p 5432:5432 \
  postgres:17

# Start target database
docker run -d --name pg-target \
  -e POSTGRES_PASSWORD=postgres \
  -p 5433:5432 \
  postgres:17

# Configure test environment
export TEST_SOURCE_URL="postgresql://postgres:postgres@localhost:5432/postgres"
export TEST_TARGET_URL="postgresql://postgres:postgres@localhost:5433/postgres"

# Run integration tests
cargo test --test integration_test -- --ignored
```

**Cleanup:**

```bash
docker stop pg-source pg-target
docker rm pg-source pg-target
```

#### Continuous Integration

The CI pipeline (`.github/workflows/ci.yml`) runs on every push and PR:

- **Tests**: Unit and doc tests on Ubuntu and macOS
- **Linting**: `cargo fmt --check` and `cargo clippy`
- **Security Audit**: Automated dependency vulnerability scanning with `cargo audit`
- **Multi-platform Builds**: Validates builds on Linux x64, macOS x64, and macOS ARM64

All checks must pass before merging to main.

#### CI/CD Pipeline Troubleshooting

The CI pipeline runs on every push and PR. Common failures:

**Formatting Failures:**
```bash
# Error: "Diff in src/commands/init.rs"
# Fix: Run cargo fmt locally
cargo fmt

# Verify formatting before committing
cargo fmt -- --check
```

**Clippy Failures:**
```bash
# Fix linting issues
cargo clippy --all-targets --all-features --fix

# Check for remaining issues
cargo clippy --all-targets --all-features -- -D warnings
```

**Failed CI After Merge:**
If main branch CI is failing, fix it immediately before any other work:
1. Check the failing job: `gh run view <run-id> --log-failed`
2. Fix the issue locally
3. Run pre-commit checks: formatting, clippy, tests
4. Commit and push the fix
5. Verify CI passes: `gh run watch <run-id>`

### Linting

```bash
# Check code formatting
cargo fmt -- --check

# Auto-format code
cargo fmt

# Run clippy (linter)
cargo clippy --all-targets --all-features -- -D warnings
```

### Running the Tool

```bash
# Validate databases
./target/release/seren-replicator validate \
  --source "postgresql://..." \
  --target "postgresql://..."

# Initialize replication (with size estimation)
./target/release/seren-replicator init \
  --source "postgresql://..." \
  --target "postgresql://..."

# Initialize with auto-confirm (for scripts)
./target/release/seren-replicator init \
  --source "postgresql://..." \
  --target "postgresql://..." \
  --yes

# Set up continuous sync
./target/release/seren-replicator sync \
  --source "postgresql://..." \
  --target "postgresql://..."

# Monitor replication status
./target/release/seren-replicator status \
  --source "postgresql://..." \
  --target "postgresql://..."

# Verify data integrity
./target/release/seren-replicator verify \
  --source "postgresql://..." \
  --target "postgresql://..."
```

## Environment Variables

### Integration Test Configuration

Integration tests require database connection URLs:

```bash
export TEST_SOURCE_URL="postgresql://user:pass@source-host:5432/db"
export TEST_TARGET_URL="postgresql://user:pass@target-host:5432/db"
```

### TCP Keepalive for External Tools

PostgreSQL client tools (pg_dump, pg_restore, psql, pg_dumpall) automatically receive TCP keepalive environment variables to prevent load balancer idle timeouts:

- `PGKEEPALIVES=1`: Enable TCP keepalives
- `PGKEEPALIVESIDLE=60`: First keepalive after 60 seconds idle
- `PGKEEPALIVESINTERVAL=10`: Subsequent keepalives every 10 seconds

These are automatically set by the tool via `utils::get_keepalive_env_vars()` for all subprocess commands. No manual configuration needed.

### Connection String Keepalives

For direct database connections (tokio-postgres), keepalive parameters are automatically added to connection URLs via `postgres::connection::add_keepalive_params()`:

- `keepalives=1`
- `keepalives_idle=60`
- `keepalives_interval=10`

Both mechanisms work together to prevent connection timeouts during long operations when connecting through AWS ELB or other load balancers.

## Architecture

### Module Structure

```text
src/
├── main.rs              # CLI entry point with clap argument parsing
├── lib.rs               # Library root, exports modules + source type detection
├── commands/            # Command implementations (one per subcommand)
│   ├── validate.rs      # Validate prerequisites for replication
│   ├── init.rs          # Initial snapshot replication (dispatches by source type)
│   ├── sync.rs          # Set up logical replication (PostgreSQL only)
│   ├── status.rs        # Monitor replication lag and health
│   └── verify.rs        # Data integrity verification
├── postgres/            # PostgreSQL connection and utilities
│   ├── connection.rs    # Database connection management with retry
│   ├── privileges.rs    # Permission checking for source/target
│   └── extensions.rs    # PostgreSQL extension management
├── migration/           # PostgreSQL data migration operations
│   ├── schema.rs        # Schema introspection (list databases/tables)
│   ├── dump.rs          # pg_dump wrapper (schema, data, globals)
│   ├── restore.rs       # pg_restore wrapper (parallel operations)
│   ├── filtered.rs      # Filtered table copy with WHERE predicates
│   ├── estimation.rs    # Database size estimation and time prediction
│   └── checksum.rs      # Data integrity verification with checksums
├── replication/         # Logical replication management (PostgreSQL only)
│   ├── publication.rs   # Create/manage publications on source
│   ├── subscription.rs  # Create/manage subscriptions on target
│   └── monitor.rs       # Replication lag monitoring and statistics
├── sqlite/              # SQLite source support
│   ├── reader.rs        # Read tables and data from SQLite files
│   └── converter.rs     # Convert SQLite types to JSONB
├── mongodb/             # MongoDB source support
│   ├── reader.rs        # Read collections from MongoDB
│   └── converter.rs     # Convert BSON documents to JSONB
├── mysql/               # MySQL/MariaDB source support
│   ├── reader.rs        # Read tables from MySQL
│   └── converter.rs     # Convert MySQL types to JSONB
├── jsonb/               # Shared JSONB utilities
│   └── writer.rs        # Write JSONB data to PostgreSQL target
├── remote/              # Remote execution on SerenAI cloud
│   ├── client.rs        # API client for job submission/status
│   └── models.rs        # Request/response models for remote API
├── filters.rs           # Selective replication filtering logic
├── table_rules.rs       # Schema-aware table rules (QualifiedTable, TableRules)
├── config.rs            # TOML configuration file parsing
├── checkpoint.rs        # Checkpoint system for resume support
├── interactive.rs       # Interactive terminal UI for database/table selection
└── utils.rs             # Shared utilities (retry, keepalive, subprocess)
```

### Replication Workflow

The tool implements a 5-phase replication workflow:

1. **Validate** - Check that both databases meet prerequisites:

   - Source: REPLICATION privilege, can create publications
   - Target: Superuser or owner privileges, can create subscriptions
   - PostgreSQL 12+ on both sides

2. **Init** - Perform initial snapshot:

   - Estimate database sizes and show predicted times
   - Dump roles/permissions with `pg_dumpall --globals-only`
   - Dump schema with `pg_dump --schema-only`
   - Dump data with `pg_dump --data-only` (directory format, parallel, compressed)
   - Restore in order: globals, schema, data (all with parallel operations)

3. **Sync** - Set up continuous replication:

   - Create publication on source (all tables)
   - Create subscription on target (connects to source)
   - Wait for initial sync to complete
   - PostgreSQL's logical replication keeps databases in sync

4. **Status** - Monitor replication health:

   - Check subscription state (streaming, syncing, etc)
   - Measure replication lag in bytes and time
   - Report statistics from source and target

5. **Verify** - Validate data integrity:

   - Compute checksums for all tables on both sides
   - Compare checksums to detect any discrepancies
   - Report detailed results per table

### Source Type Detection and Routing

The `lib.rs` module contains the central dispatch logic that routes to appropriate handlers based on source type:

```rust
pub enum SourceType {
    PostgreSQL,  // postgresql:// or postgres://
    SQLite,      // .db, .sqlite, .sqlite3 file extensions
    MongoDB,     // mongodb:// or mongodb+srv://
    MySQL,       // mysql://
}

pub fn detect_source_type(source: &str) -> Result<SourceType>
```

**Dispatch Pattern in `commands/init.rs`:**

```rust
match detect_source_type(&source)? {
    SourceType::PostgreSQL => {
        // Use migration/dump.rs + migration/restore.rs
        // Supports continuous sync via replication/
    }
    SourceType::SQLite => {
        // Use sqlite/reader.rs → jsonb/writer.rs
        // One-time replication only
    }
    SourceType::MongoDB => {
        // Use mongodb/reader.rs → jsonb/writer.rs
        // Supports periodic refresh
    }
    SourceType::MySQL => {
        // Use mysql/reader.rs → jsonb/writer.rs
        // Supports periodic refresh
    }
}
```

### JSONB Storage Pattern (Non-PostgreSQL Sources)

Non-PostgreSQL sources (SQLite, MongoDB, MySQL) use a JSONB storage pattern that preserves data fidelity while enabling flexible querying in PostgreSQL.

**Architecture:**

```text
Source DB → Reader Module → Converter → JSONB Writer → PostgreSQL Target
                ↓              ↓             ↓
           Read rows      Convert to    INSERT INTO table (id, data)
           as native      serde_json    VALUES ($1, $2::jsonb)
           types          Value
```

**Table Schema on Target:**

Each source table becomes a PostgreSQL table with this structure:
```sql
CREATE TABLE table_name (
    id SERIAL PRIMARY KEY,
    data JSONB NOT NULL
);
CREATE INDEX ON table_name USING GIN (data);
```

**Type Preservation:**

Complex types are encoded with special markers to preserve type information:

- MongoDB ObjectId: `{"$oid": "..."}`
- MongoDB Date: `{"$date": "..."}`
- Binary data: `{"$binary": {"base64": "...", "subType": "..."}}`
- SQLite BLOB: Base64-encoded strings

**Reader/Converter/Writer Pattern:**

Each source type has a reader and converter module:

- `sqlite/reader.rs` + `sqlite/converter.rs`
- `mongodb/reader.rs` + `mongodb/converter.rs`
- `mysql/reader.rs` + `mysql/converter.rs`

All converters produce `serde_json::Value` which the shared `jsonb/writer.rs` inserts into PostgreSQL.

### Key Design Decisions

**PostgreSQL Client Tools:**

- Uses native `pg_dump`, `pg_dumpall`, and `pg_restore` commands via `std::process::Command`
- Ensures PostgreSQL tools are installed and accessible before operations
- Leverages PostgreSQL's optimized, well-tested dump/restore implementations

**Parallel Operations:**

- Auto-detects CPU cores (up to 8 parallel workers)
- Uses PostgreSQL directory format to enable parallel dump/restore
- Significantly faster for large databases with many tables

**Logical Replication:**

- Uses PostgreSQL's native logical replication (publications/subscriptions)
- Enables zero-downtime migration - databases stay in sync after initial copy
- Requires REPLICATION privilege on source, subscription privileges on target

**Connection Management:**

- Uses `tokio-postgres` for async database operations
- TLS support via `postgres-native-tls` for secure connections
- Connection strings follow standard PostgreSQL URI format

**Error Handling:**

- Uses `anyhow` for error propagation and context
- Fail-fast approach - validates prerequisites before destructive operations
- Clear error messages guide users to fix permission/configuration issues

**Connection Retry with Exponential Backoff:**

Two complementary retry mechanisms handle transient failures:

1. **Direct Database Connections** (`postgres::connection::connect_with_retry`):
   - Retries: 3 attempts
   - Backoff: 1s, 2s, 4s (exponential)
   - Applied to: All tokio-postgres connections
   - Scope Management: Connections dropped before long subprocess operations to prevent idle timeouts

2. **External PostgreSQL Commands** (`utils::retry_subprocess_with_backoff`):
   - Retries: 3 attempts
   - Backoff: 2s, 4s, 8s (exponential)
   - Applied to: pg_dump, pg_restore, psql, pg_dumpall subprocess calls
   - Detects: Connection refused, timeout, network unreachable errors

**Applied in 9+ critical locations:**

- Database discovery (`migration::schema::list_databases`)
- Database creation (init command)
- Filtered table copy operations (`migration::filtered::copy_filtered_tables`)
- Column queries and schema introspection
- wal_level configuration checks
- All external PostgreSQL tool invocations

**Prevents failures from:**

- Temporary network interruptions
- Database restarts during maintenance
- Load balancer connection drops
- Idle timeout disconnects (connections dropped before long operations)

### Filtering System

The filtering system provides selective replication - users can choose specific databases and tables to replicate instead of migrating everything. This is implemented through two complementary approaches: CLI flags and interactive mode.

#### ReplicationFilter (src/filters.rs)

The `ReplicationFilter` struct is the central filtering logic used by all commands:

```rust
pub struct ReplicationFilter {
    include_databases: Option<Vec<String>>,
    exclude_databases: Option<Vec<String>>,
    include_tables: Option<Vec<String>>, // Format: "db.table"
    exclude_tables: Option<Vec<String>>, // Format: "db.table"
}
```

**Constructor Validation:**

The `ReplicationFilter::new()` constructor enforces these rules:
- Database filters are mutually exclusive: cannot use both `--include-databases` and `--exclude-databases`
- Table filters are mutually exclusive: cannot use both `--include-tables` and `--exclude-tables`
- Table names must be in `"database.table"` format (validates with `.contains('.')`)
- Returns `anyhow::Result<Self>` with clear error messages for violations

**Filtering Methods:**

- `should_replicate_database(db_name: &str) -> bool`
  - Returns true if database passes filters
  - Include list: database must be in the list
  - Exclude list: database must NOT be in the list
  - No filters: all databases pass

- `should_replicate_table(db_name: &str, table_name: &str) -> bool`
  - Returns true if table passes filters
  - Constructs full name as `"db_name.table_name"`
  - Include list: full name must be in the list
  - Exclude list: full name must NOT be in the list
  - No filters: all tables pass

- `get_databases_to_replicate(source_conn: &Client) -> Result<Vec<String>>`
  - Queries source for all databases via `migration::schema::list_databases()`
  - Filters using `should_replicate_database()`
  - Returns error if no databases match filters
  - Used by multi-database commands (verify, status, sync, init)

- `get_tables_to_replicate(source_conn: &Client, db_name: &str) -> Result<Vec<String>>`
  - Queries source for all tables in a database via `migration::schema::list_tables()`
  - Filters using `should_replicate_table()`
  - Returns empty vec if no tables match (not an error)
  - Used by commands that need table-level filtering

#### Interactive Mode (src/interactive.rs)

Interactive mode provides a terminal UI for selecting databases and tables, built with the `dialoguer` crate:

**Function Signature:**
```rust
pub async fn select_databases_and_tables(source_url: &str) -> Result<ReplicationFilter>
```

**Workflow:**

1. **Connect to Source** - Connects to the source database URL

2. **Discover Databases** - Queries for all user databases (excludes templates)

3. **Select Databases** - Shows multi-select checklist:
   ```
   Select databases to replicate:
   (Use arrow keys to navigate, Space to select, Enter to confirm)

   > [x] myapp
     [x] analytics
     [ ] staging
     [ ] test
   ```

4. **Select Tables to Exclude** (per database):
   - For each selected database, connect to it and discover tables
   - Show multi-select checklist for tables to EXCLUDE
   - Pressing Enter without selections includes all tables
   - Tables are shown as simple names if in `public` schema, or `schema.table` otherwise
   - Internally stores exclusions as `"database.table"` format

5. **Show Summary and Confirm**:
   ```
   ========================================
   Replication Configuration Summary
   ========================================

   Databases to replicate: 2
     ✓ myapp
     ✓ analytics

   Tables to exclude: 2
     ✗ myapp.logs
     ✗ myapp.cache

   ========================================

   Proceed with this configuration? [Y/n]:
   ```

6. **Build ReplicationFilter** - Converts selections to `ReplicationFilter`:
   - Selected databases → `include_databases`
   - Excluded tables → `exclude_tables`
   - Returns the filter for use by commands

**URL Manipulation:**

The `replace_database_in_url()` helper function modifies a PostgreSQL connection URL to connect to a specific database:
```rust
fn replace_database_in_url(url: &str, new_db_name: &str) -> Result<String>
```
This is critical for multi-database operations - it preserves query parameters (like SSL settings) while changing only the database name.

#### Command Integration

Commands integrate filtering in two ways:

**Commands with Interactive Mode** (validate, init, sync):

```rust
let filter = if interactive {
    // Interactive mode - prompt user to select databases and tables
    interactive::select_databases_and_tables(&source).await?
} else {
    // CLI mode - use provided filter arguments
    ReplicationFilter::new(
        include_databases,
        exclude_databases,
        include_tables,
        exclude_tables,
    )?
};
```

These commands accept both `--interactive` flag and CLI filter flags (`--include-databases`, `--exclude-tables`, etc.). Interactive mode and CLI filters are mutually exclusive in practice.

**Commands with CLI-Only Filtering** (status, verify):

```rust
let filter = ReplicationFilter::new(
    include_databases,
    exclude_databases,
    include_tables,
    exclude_tables,
)?;
```

These commands don't support `--interactive` because they operate on existing replication setups and don't perform discovery.

#### Multi-Database Replication Pattern

Commands that support multiple databases (init, sync, status, verify) follow this pattern:

1. **Discover Databases:**
   ```rust
   let databases = filter.get_databases_to_replicate(&source_conn).await?;
   ```

2. **Loop Through Each Database:**
   ```rust
   for db_name in databases {
       // Build database-specific connection URLs
       let source_db_url = replace_database_in_url(&source_url, &db_name)?;
       let target_db_url = replace_database_in_url(&target_url, &db_name)?;

       // Connect to specific database
       let source_db_conn = connect(&source_db_url).await?;
       let target_db_conn = connect(&target_db_url).await?;

       // Perform operation on this database
       // (validation, sync setup, status check, verification, etc.)
   }
   ```

3. **Report Overall Results:**
   Commands typically report per-database results and an overall summary.

**Table-Level Filtering in Operations:**

For operations that need table-level filtering (like verify and sync):

```rust
// Get tables to replicate for this database
let tables = filter.get_tables_to_replicate(&source_db_conn, &db_name).await?;

// Or check individual tables
if filter.should_replicate_table(&db_name, &table_name) {
    // Process this table
}
```

This architecture ensures consistent filtering behavior across all commands while allowing each command to implement its specific operation logic.

#### Configuration File System (src/config.rs)

For complex migrations with many rules, use TOML configuration files instead of CLI flags:

**File Location:** Pass via `--config replication-config.toml` to init/sync commands

**Structure:**

```toml
[databases.mydb]
schema_only = ["table1", "schema.table2"]

[[databases.mydb.table_filters]]
table = "events"
schema = "analytics"  # optional, defaults to public
where = "created_at > NOW() - INTERVAL '90 days'"

[[databases.mydb.time_filters]]
table = "metrics"
column = "timestamp"
last = "6 months"
```

**Implementation:**

- Parsed by `config::load_table_rules_from_file()` into `TableRules` struct
- Supports dot notation (`"schema.table"`) or explicit schema field
- CLI flags merge on top of config file (CLI takes precedence)
- See [docs/replication-config.md]docs/replication-config.md for full schema

**Integration:**

- Init command: filters pg_dump operations, filtered copy for predicates
- Sync command: creates publications with WHERE clauses (requires PG 15+)
- Fingerprint: hashed and included in checkpoint metadata

### Schema-Aware Filtering

PostgreSQL supports multiple schemas (namespaces) within a database, allowing tables with identical names to coexist in different schemas (e.g., `public.orders` and `analytics.orders`). Schema-aware filtering enables precise targeting of schema-qualified tables.

#### Core Data Structures (src/table_rules.rs)

**QualifiedTable:**

The `QualifiedTable` struct represents a fully-qualified table identifier:

```rust
pub struct QualifiedTable {
    database: Option<String>,
    schema: String,
    table: String,
}
```

Key methods:
- `parse(input: &str) -> Result<Self>` - Parses `"schema.table"` or `"table"` (defaults to `public` schema)
- `with_database(self, db: Option<String>) -> Self` - Adds database context
- `qualified_name() -> String` - Returns `"schema"."table"` format for PostgreSQL

**SchemaTableKey:**

Internal storage uses `(schema, table)` tuples to prevent collisions:

```rust
type SchemaTableKey = (String, String);
```

This ensures `public.orders` and `analytics.orders` are treated as distinct entities throughout the system.

**TableRules:**

The `TableRules` struct stores per-database, schema-aware filtering rules:

```rust
pub struct TableRules {
    schema_only_by_db: HashMap<String, BTreeSet<SchemaTableKey>>,
    table_filters_by_db: HashMap<String, HashMap<SchemaTableKey, String>>,
    time_filters_by_db: HashMap<String, HashMap<SchemaTableKey, TimeFilter>>,
}
```

#### Schema-Aware Configuration (src/config.rs)

The TOML config parser supports two notations:

**Dot notation (backward compatible):**

```toml
[databases.mydb]
schema_only = ["analytics.large_table", "public.temp"]
```

**Explicit schema field:**

```toml
[[databases.mydb.table_filters]]
table = "events"
schema = "analytics"
where = "created_at > NOW() - INTERVAL '90 days'"
```

Both are parsed into `QualifiedTable` instances with full schema information.

#### Schema Qualification in pg_dump (src/migration/dump.rs)

Table names passed to `pg_dump` commands are schema-qualified:

```rust
fn get_excluded_tables_for_db(filter: &ReplicationFilter, db_name: &str) -> Option<Vec<String>> {
    // Returns format: "schema"."table"
    tables.insert(format!("\"{}\".\"{}\", schema, table));
}
```

This ensures `--exclude-table` and `--table` flags target the correct schema-qualified tables.

#### Filtered Copy with Schema Awareness (src/migration/filtered.rs)

The `copy_filtered_tables()` function handles schema-qualified tables for filtered snapshots:

```rust
pub async fn copy_filtered_tables(
    source_url: &str,
    target_url: &str,
    tables: &[(String, String)],  // (qualified_table_name, predicate)
) -> Result<()>
```

**Schema parsing:**

```rust
fn parse_schema_table(qualified: &str) -> Result<(String, String)> {
    // Parses "schema"."table" → (schema, table)
    let parts: Vec<&str> = qualified.split('.').map(|s| s.trim_matches('"')).collect();
    if parts.len() == 2 {
        Ok((parts[0].to_string(), parts[1].to_string()))
    } else {
        bail!("Expected schema-qualified table name");
    }
}
```

**FK CASCADE safety with schema awareness:**

The `get_cascade_targets()` function queries PostgreSQL for FK dependencies using schema-qualified lookups:

```rust
async fn get_cascade_targets(
    client: &Client,
    schema: &str,
    table: &str,
) -> Result<Vec<(String, String)>> {
    let query = r#"
        WITH RECURSIVE fk_tree AS (
            SELECT n.nspname as schema_name, c.relname as table_name, 0 as depth
            FROM pg_class c
            JOIN pg_namespace n ON c.relnamespace = n.oid
            WHERE n.nspname = $1 AND c.relname = $2
            ...
        )
    "#;
    client.query(query, &[&schema, &table]).await
}
```

This ensures `TRUNCATE CASCADE` operations correctly identify all FK-related tables across schemas.

#### Checkpoint Fingerprinting

Checkpoint fingerprints include schema information to detect scope changes:

```rust
pub fn fingerprint(&self) -> String {
    let mut hasher = Sha256::new();

    // Hash schema-qualified table rules
    for (db, tables) in &self.schema_only_by_db {
        for (schema, table) in tables {
            hasher.update(db.as_bytes());
            hasher.update(schema.as_bytes());
            hasher.update(table.as_bytes());
        }
    }
    // ... hash other rules

    format!("{:x}", hasher.finalize())
}
```

If a user changes from replicating `public.orders` to `analytics.orders`, the fingerprint changes and checkpoints are invalidated, preventing resumption with incorrect scope.

#### Backward Compatibility

**Default to public schema:**

When schema is not specified, all parsers default to `public`:

```rust
impl QualifiedTable {
    pub fn parse(input: &str) -> Result<Self> {
        if input.contains('.') {
            // "schema.table" format
        } else {
            // Default to public schema
            Ok(Self {
                database: None,
                schema: "public".to_string(),
                table: input.to_string(),
            })
        }
    }
}
```

This ensures existing configs and CLI invocations continue to work without modification.

**CLI flag parsing:**

```rust
// These are equivalent:
--schema-only-tables "users"
--schema-only-tables "public.users"
```

#### Testing Strategy

Schema-aware functionality is tested at multiple levels:

1. **Unit tests** - QualifiedTable parsing, SchemaTableKey storage, fingerprint changes
2. **Integration tests** - FK CASCADE detection across schemas, filtered copy with schema qualification
3. **Fingerprint tests** - Verify different schemas produce different fingerprints

Key test example:

```rust
#[test]
fn test_fingerprint_changes_with_schema() {
    let mut rules_a = TableRules::default();
    rules_a.apply_schema_only_cli(&["public.orders".to_string()]).unwrap();

    let mut rules_b = TableRules::default();
    rules_b.apply_schema_only_cli(&["analytics.orders".to_string()]).unwrap();

    assert_ne!(
        rules_a.fingerprint(),
        rules_b.fingerprint(),
        "Different schemas should produce different fingerprints"
    );
}
```

### Checkpoint System (src/checkpoint.rs)

The init command uses persistent checkpoints to enable resume support for long-running operations:

**Architecture:**

- Checkpoints are stored as JSON files in `.seren-replicator/` directory
- Each checkpoint includes: source/target URL hashes, filter fingerprint, database list, completed databases
- Fingerprinting uses SHA256 to detect scope changes (different filters invalidate checkpoints)
- Version field enables format evolution

**Resume Logic:**

- On init start, checks for existing checkpoint matching current operation metadata
- If found and metadata matches, skips completed databases and continues with remaining ones
- If metadata differs (different filters, URLs, options), checkpoint is invalid and fresh run starts
- `--no-resume` flag forces fresh run with new checkpoint

**Checkpoint Metadata:**

- `source_hash`: SHA256 of source URL
- `target_hash`: SHA256 of target URL
- `filter_hash`: Fingerprint from TableRules (includes schema-qualified table rules)
- `drop_existing`: Whether --drop-existing was used
- `enable_sync`: Whether --enable-sync was used

This ensures checkpoints are only used when the operation scope hasn't changed, preventing data loss from resumed operations with incorrect scope.