worker-service 0.2.0

Worker Service - A worker administration microservice that interoperates with the worker-matcher crate
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
# Database Schema Design

## Overview

This document describes the PostgreSQL database schema for the Worker Service (MPI) system. The schema is designed to support millions of worker records with high performance, HIPAA compliance, and full audit trail capabilities.

## Design Principles

1. **Normalization**: Properly normalized to 3NF to avoid data redundancy
2. **Audit Trail**: All tables include created_at, updated_at, created_by, updated_by
3. **Soft Delete**: Support for soft deletes with deleted_at, deleted_by columns
4. **UUIDs**: Use UUIDs for primary keys to support distributed systems
5. **Indexing**: Strategic indexes for common query patterns
6. **Referential Integrity**: Foreign keys with appropriate cascade rules
7. **HIPAA Compliance**: Audit logging and data integrity

## Core Tables

### workers

Primary worker record table.

```sql
CREATE TABLE workers (
    id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
    active BOOLEAN NOT NULL DEFAULT true,
    gender VARCHAR(20) NOT NULL CHECK (gender IN ('male', 'female', 'other', 'unknown')),
    birth_date DATE,
    deceased BOOLEAN NOT NULL DEFAULT false,
    deceased_datetime TIMESTAMPTZ,
    marital_status VARCHAR(50),
    multiple_birth BOOLEAN,
    managing_organization_id UUID REFERENCES organizations(id),

    -- Audit fields
    created_at TIMESTAMPTZ NOT NULL DEFAULT CURRENT_TIMESTAMP,
    updated_at TIMESTAMPTZ NOT NULL DEFAULT CURRENT_TIMESTAMP,
    created_by VARCHAR(255),
    updated_by VARCHAR(255),

    -- Soft delete
    deleted_at TIMESTAMPTZ,
    deleted_by VARCHAR(255),

    -- Indexes
    INDEX idx_workers_birth_date (birth_date),
    INDEX idx_workers_gender (gender),
    INDEX idx_workers_active (active),
    INDEX idx_workers_organization (managing_organization_id),
    INDEX idx_workers_deleted_at (deleted_at)
);
```

### worker_names

Stores multiple names per worker (legal name, maiden name, aliases, etc.).

```sql
CREATE TABLE worker_names (
    id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
    worker_id UUID NOT NULL REFERENCES workers(id) ON DELETE CASCADE,
    use_type VARCHAR(20) CHECK (use_type IN ('usual', 'official', 'temp', 'nickname', 'anonymous', 'old', 'maiden')),
    family VARCHAR(255) NOT NULL,
    given TEXT[] NOT NULL DEFAULT '{}',  -- Array of given names
    prefix TEXT[] NOT NULL DEFAULT '{}', -- Array of prefixes (Dr., Mr., etc.)
    suffix TEXT[] NOT NULL DEFAULT '{}', -- Array of suffixes (Jr., III, etc.)
    is_primary BOOLEAN NOT NULL DEFAULT false,

    -- Audit fields
    created_at TIMESTAMPTZ NOT NULL DEFAULT CURRENT_TIMESTAMP,
    updated_at TIMESTAMPTZ NOT NULL DEFAULT CURRENT_TIMESTAMP,

    -- Indexes
    INDEX idx_worker_names_worker_id (worker_id),
    INDEX idx_worker_names_family (family),
    INDEX idx_worker_names_is_primary (is_primary)
);
```

### worker_identifiers

Stores worker identifiers (MRN, SSN, driver's license, etc.).

```sql
CREATE TABLE worker_identifiers (
    id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
    worker_id UUID NOT NULL REFERENCES workers(id) ON DELETE CASCADE,
    use_type VARCHAR(20) CHECK (use_type IN ('usual', 'official', 'temp', 'secondary', 'old')),
    identifier_type VARCHAR(10) NOT NULL CHECK (identifier_type IN ('MRN', 'SSN', 'DL', 'NPI', 'PPN', 'TAX', 'OTHER')),
    system VARCHAR(255) NOT NULL,  -- Namespace/system URI
    value VARCHAR(255) NOT NULL,   -- The actual identifier value
    assigner VARCHAR(255),         -- Organization that issued the identifier

    -- Audit fields
    created_at TIMESTAMPTZ NOT NULL DEFAULT CURRENT_TIMESTAMP,
    updated_at TIMESTAMPTZ NOT NULL DEFAULT CURRENT_TIMESTAMP,

    -- Unique constraint: one identifier per system
    UNIQUE(system, value),

    -- Indexes
    INDEX idx_worker_identifiers_worker_id (worker_id),
    INDEX idx_worker_identifiers_type (identifier_type),
    INDEX idx_worker_identifiers_value (value),
    INDEX idx_worker_identifiers_system_value (system, value)
);
```

### worker_addresses

Stores multiple addresses per worker.

```sql
CREATE TABLE worker_addresses (
    id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
    worker_id UUID NOT NULL REFERENCES workers(id) ON DELETE CASCADE,
    use_type VARCHAR(20) CHECK (use_type IN ('home', 'work', 'temp', 'old', 'billing')),
    line1 VARCHAR(255),
    line2 VARCHAR(255),
    city VARCHAR(100),
    state VARCHAR(50),
    postal_code VARCHAR(20),
    country VARCHAR(100) DEFAULT 'USA',
    is_primary BOOLEAN NOT NULL DEFAULT false,

    -- Audit fields
    created_at TIMESTAMPTZ NOT NULL DEFAULT CURRENT_TIMESTAMP,
    updated_at TIMESTAMPTZ NOT NULL DEFAULT CURRENT_TIMESTAMP,

    -- Indexes
    INDEX idx_worker_addresses_worker_id (worker_id),
    INDEX idx_worker_addresses_postal_code (postal_code),
    INDEX idx_worker_addresses_city_state (city, state),
    INDEX idx_worker_addresses_is_primary (is_primary)
);
```

### worker_contacts

Stores contact information (phone, email, etc.).

```sql
CREATE TABLE worker_contacts (
    id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
    worker_id UUID NOT NULL REFERENCES workers(id) ON DELETE CASCADE,
    system VARCHAR(20) NOT NULL CHECK (system IN ('phone', 'fax', 'email', 'pager', 'url', 'sms', 'other')),
    value VARCHAR(255) NOT NULL,
    use_type VARCHAR(20) CHECK (use_type IN ('home', 'work', 'temp', 'old', 'mobile')),
    is_primary BOOLEAN NOT NULL DEFAULT false,

    -- Audit fields
    created_at TIMESTAMPTZ NOT NULL DEFAULT CURRENT_TIMESTAMP,
    updated_at TIMESTAMPTZ NOT NULL DEFAULT CURRENT_TIMESTAMP,

    -- Indexes
    INDEX idx_worker_contacts_worker_id (worker_id),
    INDEX idx_worker_contacts_system (system),
    INDEX idx_worker_contacts_value (value),
    INDEX idx_worker_contacts_is_primary (is_primary)
);
```

### worker_links

Links between worker records (duplicates, merges, references).

```sql
CREATE TABLE worker_links (
    id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
    worker_id UUID NOT NULL REFERENCES workers(id) ON DELETE CASCADE,
    other_worker_id UUID NOT NULL REFERENCES workers(id) ON DELETE CASCADE,
    link_type VARCHAR(20) NOT NULL CHECK (link_type IN ('replaced_by', 'replaces', 'refer', 'seealso')),

    -- Audit fields
    created_at TIMESTAMPTZ NOT NULL DEFAULT CURRENT_TIMESTAMP,
    created_by VARCHAR(255),

    -- Prevent self-links
    CHECK (worker_id != other_worker_id),

    -- Prevent duplicate links
    UNIQUE(worker_id, other_worker_id, link_type),

    -- Indexes
    INDEX idx_worker_links_worker_id (worker_id),
    INDEX idx_worker_links_other_worker_id (other_worker_id),
    INDEX idx_worker_links_link_type (link_type)
);
```

### organizations

Healthcare organizations (hospitals, clinics, etc.).

```sql
CREATE TABLE organizations (
    id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
    active BOOLEAN NOT NULL DEFAULT true,
    name VARCHAR(255) NOT NULL,
    alias TEXT[] NOT NULL DEFAULT '{}',  -- Array of alias names
    org_type TEXT[] NOT NULL DEFAULT '{}',  -- Array of organization types
    part_of UUID REFERENCES organizations(id),  -- Parent organization

    -- Audit fields
    created_at TIMESTAMPTZ NOT NULL DEFAULT CURRENT_TIMESTAMP,
    updated_at TIMESTAMPTZ NOT NULL DEFAULT CURRENT_TIMESTAMP,
    created_by VARCHAR(255),
    updated_by VARCHAR(255),

    -- Soft delete
    deleted_at TIMESTAMPTZ,
    deleted_by VARCHAR(255),

    -- Indexes
    INDEX idx_organizations_name (name),
    INDEX idx_organizations_active (active),
    INDEX idx_organizations_part_of (part_of),
    INDEX idx_organizations_deleted_at (deleted_at)
);
```

### organization_identifiers

Organization identifiers (NPI, Tax ID, etc.).

```sql
CREATE TABLE organization_identifiers (
    id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
    organization_id UUID NOT NULL REFERENCES organizations(id) ON DELETE CASCADE,
    use_type VARCHAR(20) CHECK (use_type IN ('usual', 'official', 'temp', 'secondary', 'old')),
    identifier_type VARCHAR(10) NOT NULL,
    system VARCHAR(255) NOT NULL,
    value VARCHAR(255) NOT NULL,
    assigner VARCHAR(255),

    -- Audit fields
    created_at TIMESTAMPTZ NOT NULL DEFAULT CURRENT_TIMESTAMP,
    updated_at TIMESTAMPTZ NOT NULL DEFAULT CURRENT_TIMESTAMP,

    -- Unique constraint
    UNIQUE(system, value),

    -- Indexes
    INDEX idx_org_identifiers_organization_id (organization_id),
    INDEX idx_org_identifiers_type (identifier_type),
    INDEX idx_org_identifiers_value (value)
);
```

### organization_addresses

Organization addresses.

```sql
CREATE TABLE organization_addresses (
    id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
    organization_id UUID NOT NULL REFERENCES organizations(id) ON DELETE CASCADE,
    use_type VARCHAR(20) CHECK (use_type IN ('home', 'work', 'temp', 'old', 'billing')),
    line1 VARCHAR(255),
    line2 VARCHAR(255),
    city VARCHAR(100),
    state VARCHAR(50),
    postal_code VARCHAR(20),
    country VARCHAR(100) DEFAULT 'USA',
    is_primary BOOLEAN NOT NULL DEFAULT false,

    -- Audit fields
    created_at TIMESTAMPTZ NOT NULL DEFAULT CURRENT_TIMESTAMP,
    updated_at TIMESTAMPTZ NOT NULL DEFAULT CURRENT_TIMESTAMP,

    -- Indexes
    INDEX idx_org_addresses_organization_id (organization_id),
    INDEX idx_org_addresses_postal_code (postal_code)
);
```

### organization_contacts

Organization contact information.

```sql
CREATE TABLE organization_contacts (
    id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
    organization_id UUID NOT NULL REFERENCES organizations(id) ON DELETE CASCADE,
    system VARCHAR(20) NOT NULL CHECK (system IN ('phone', 'fax', 'email', 'pager', 'url', 'sms', 'other')),
    value VARCHAR(255) NOT NULL,
    use_type VARCHAR(20) CHECK (use_type IN ('home', 'work', 'temp', 'old', 'mobile')),
    is_primary BOOLEAN NOT NULL DEFAULT false,

    -- Audit fields
    created_at TIMESTAMPTZ NOT NULL DEFAULT CURRENT_TIMESTAMP,
    updated_at TIMESTAMPTZ NOT NULL DEFAULT CURRENT_TIMESTAMP,

    -- Indexes
    INDEX idx_org_contacts_organization_id (organization_id),
    INDEX idx_org_contacts_system (system)
);
```

## Audit Tables

### audit_log

Complete audit trail for HIPAA compliance.

```sql
CREATE TABLE audit_log (
    id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
    timestamp TIMESTAMPTZ NOT NULL DEFAULT CURRENT_TIMESTAMP,
    user_id VARCHAR(255),
    action VARCHAR(50) NOT NULL,  -- CREATE, UPDATE, DELETE, MERGE, LINK, etc.
    entity_type VARCHAR(50) NOT NULL,  -- worker, organization, etc.
    entity_id UUID NOT NULL,
    old_values JSONB,
    new_values JSONB,
    ip_address INET,
    user_agent TEXT,

    -- Indexes
    INDEX idx_audit_log_timestamp (timestamp),
    INDEX idx_audit_log_entity (entity_type, entity_id),
    INDEX idx_audit_log_user_id (user_id),
    INDEX idx_audit_log_action (action)
);
```

## Matching Tables

### worker_match_scores

Stores calculated match scores between worker records.

```sql
CREATE TABLE worker_match_scores (
    id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
    worker_id UUID NOT NULL REFERENCES workers(id) ON DELETE CASCADE,
    candidate_id UUID NOT NULL REFERENCES workers(id) ON DELETE CASCADE,
    total_score DECIMAL(5,4) NOT NULL,  -- 0.0000 to 1.0000
    name_score DECIMAL(5,4),
    birth_date_score DECIMAL(5,4),
    gender_score DECIMAL(5,4),
    address_score DECIMAL(5,4),
    identifier_score DECIMAL(5,4),
    calculated_at TIMESTAMPTZ NOT NULL DEFAULT CURRENT_TIMESTAMP,

    -- Prevent self-matching
    CHECK (worker_id != candidate_id),

    -- Unique constraint
    UNIQUE(worker_id, candidate_id),

    -- Indexes
    INDEX idx_match_scores_worker_id (worker_id),
    INDEX idx_match_scores_total_score (total_score DESC),
    INDEX idx_match_scores_calculated_at (calculated_at)
);
```

## Functions and Triggers

### Update timestamp trigger

```sql
CREATE OR REPLACE FUNCTION update_updated_at_column()
RETURNS TRIGGER AS $$
BEGIN
    NEW.updated_at = CURRENT_TIMESTAMP;
    RETURN NEW;
END;
$$ language 'plpgsql';
```

Apply to all tables with updated_at:

```sql
CREATE TRIGGER update_workers_updated_at BEFORE UPDATE ON workers
    FOR EACH ROW EXECUTE FUNCTION update_updated_at_column();

CREATE TRIGGER update_organizations_updated_at BEFORE UPDATE ON organizations
    FOR EACH ROW EXECUTE FUNCTION update_updated_at_column();

-- Repeat for all tables with updated_at
```

## Performance Considerations

### Indexes

All tables include indexes for:

- Primary keys (automatic)
- Foreign keys
- Common search fields (name, birth_date, postal_code)
- Soft delete fields (deleted_at)
- Primary flags (is_primary)

### Partitioning (Future)

For very large deployments (10M+ workers), consider:

- Partitioning `audit_log` by timestamp (monthly partitions)
- Partitioning `worker_match_scores` if storing all calculated scores

### Statistics

```sql
-- Update statistics for query planner
ANALYZE workers;
ANALYZE worker_names;
ANALYZE worker_identifiers;
```

## Security

### Row-Level Security (RLS)

Can be enabled for multi-tenant deployments:

```sql
ALTER TABLE workers ENABLE ROW LEVEL SECURITY;
CREATE POLICY worker_access ON workers
    FOR ALL
    USING (managing_organization_id = current_setting('app.organization_id')::uuid);
```

## Migration Strategy

1. Create tables in dependency order (organizations before workers)
2. Add indexes after initial data load for better performance
3. Enable triggers after bulk data import
4. Run ANALYZE after significant data changes

## Data Integrity Rules

1. **Cascading Deletes**: Child records (names, addresses) cascade when worker deleted
2. **Referential Integrity**: All foreign keys enforced
3. **Check Constraints**: Enum values enforced at database level
4. **Unique Constraints**: Prevent duplicate identifiers
5. **Soft Deletes**: Never hard delete workers (HIPAA requirement)

## Capacity Planning

Estimated storage for 10 million workers:

| Table              | Rows | Size per Row | Total Size |
| ------------------ | ---- | ------------ | ---------- |
| workers            | 10M  | 500 bytes    | 5 GB       |
| worker_names       | 15M  | 300 bytes    | 4.5 GB     |
| worker_identifiers | 30M  | 200 bytes    | 6 GB       |
| worker_addresses   | 20M  | 250 bytes    | 5 GB       |
| worker_contacts    | 30M  | 200 bytes    | 6 GB       |
| **Total**          |      |              | **~27 GB** |

Add 50% for indexes: **~40 GB total**