queue-runtime 0.2.0

Multi-provider queue runtime for Queue-Keeper
Documentation
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730
731
732
733
734
735
736
737
738
739
740
741
742
743
744
745
746
747
748
749
750
751
752
753
754
755
756
757
758
759
760
761
762
763
764
765
766
767
768
769
770
771
772
773
774
775
776
777
778
779
780
781
782
783
784
785
786
787
788
789
790
791
792
793
794
795
796
797
798
799
800
801
802
803
804
805
806
807
808
809
810
811
812
813
814
815
816
817
818
819
820
821
822
823
824
825
826
827
828
829
830
831
832
833
834
835
836
837
838
839
840
841
842
843
844
845
846
847
848
849
850
851
852
853
854
855
856
857
858
859
860
861
862
863
864
865
866
867
868
869
870
871
872
873
874
875
876
877
878
879
880
881
882
883
884
885
886
887
888
889
890
891
892
893
894
895
896
897
898
899
900
901
902
903
904
905
906
907
908
909
910
911
912
913
914
915
916
917
918
919
920
921
922
923
924
925
926
927
928
929
930
931
932
933
934
935
936
937
938
939
940
941
942
943
944
945
946
947
948
949
950
951
952
953
954
955
956
957
958
959
960
961
962
963
964
965
966
967
968
969
970
971
972
973
974
975
976
977
978
979
980
981
982
983
984
985
986
987
988
989
990
991
992
993
994
995
996
997
998
999
1000
1001
1002
1003
1004
1005
1006
1007
1008
1009
1010
1011
1012
1013
1014
1015
1016
1017
1018
1019
1020
1021
1022
1023
1024
1025
1026
1027
1028
1029
1030
1031
1032
1033
1034
1035
1036
1037
1038
1039
1040
1041
1042
1043
1044
1045
1046
1047
1048
1049
1050
1051
1052
1053
1054
1055
1056
1057
1058
1059
1060
1061
1062
1063
1064
1065
1066
1067
1068
1069
1070
1071
1072
1073
1074
1075
1076
1077
1078
1079
1080
1081
1082
1083
1084
1085
1086
1087
1088
# MITM Protection Architecture Summary


**Version**: 1.0
**Date**: January 15, 2026
**Status**: Architecture Complete - Ready for Interface Design

---

## Executive Summary


This document summarizes the architectural design for man-in-the-middle (MITM) protection in the queue-runtime library. The solution provides end-to-end message encryption and authentication to prevent tampering, eavesdropping, and replay attacks on messages stored in cloud queues.

### Key Design Decisions


1. **Hybrid Approach**: Library enforces encryption/authentication, applications provide keys
2. **Authenticated Encryption**: AES-256-GCM provides both confidentiality and integrity
3. **Transparent API**: Encryption/decryption automatic, no API changes for applications
4. **Key Rotation**: Support multiple active keys for zero-downtime rotation
5. **Opt-In**: Encryption disabled by default for backward compatibility
6. **Replay Protection**: Timestamp-based freshness validation with optional nonce tracking

---

## Problem Statement


**Threat**: Malicious actor with access to queue infrastructure (Azure Service Bus, AWS SQS) can:

- Read message contents (eavesdropping)
- Modify message payloads (tampering)
- Inject fake messages (forgery)
- Replay old messages (replay attacks)
- Substitute entire messages (substitution)

**Impact**: Compromise of private repository data, webhook payload integrity, and application logic correctness.

**Existing Protection**: TLS protects network transit, but not storage at rest or provider-internal access.

**Solution**: End-to-end encryption with authenticated encryption (AEAD), protecting messages even if queue infrastructure is compromised.

---

## Architecture Overview


### Encryption Algorithm: Symmetric AES-256-GCM


**Choice**: Symmetric encryption (not asymmetric/public-private key)

**Rationale**:

- **Performance**: 100-1000x faster than RSA/ECC asymmetric encryption (~1-2μs vs 100+μs)
- **Message Size**: No size limits (RSA limited to ~190 bytes with 2048-bit key)
- **Standard Practice**: Industry standard for bulk data encryption (what TLS uses internally)
- **Hardware Acceleration**: AES-NI CPU instructions on modern processors

**Alternative Considered: Asymmetric (Public/Private Key)**:

- ❌ Much slower for message-sized payloads
- ❌ Size limitations requiring hybrid approach anyway
- ✅ Only beneficial if different services with different keys

### Cryptography Module


New module providing message-level security through:

1. **CryptoProvider Trait**: Abstraction for encryption/decryption operations
2. **KeyProvider Trait**: Abstraction for loading encryption keys from secret stores
3. **EncryptedMessage Type**: Container for ciphertext, nonce, auth tag, and metadata
4. **EncryptionKey Type**: Secure key material with automatic memory zeroing
5. **Encryption Detection**: Magic marker (\"QRE1\") enables receiver to detect encrypted vs plaintext messages

See [cryptography module specification](./modules/cryptography.md) for complete details.

### Algorithm: AES-256-GCM


**Algorithm**: AES-256-GCM (Advanced Encryption Standard, 256-bit key, Galois/Counter Mode)

**Type**: Symmetric encryption (same key for encryption and decryption)

**Properties**:

- **Authenticated Encryption**: Provides both confidentiality (encryption) and integrity (authentication)
- **Nonce-Based**: Requires unique 96-bit nonce per encryption operation
- **Associated Data**: Authenticates message ID, session ID, timestamp without encrypting
- **Performance**: Hardware-accelerated on modern CPUs (~1-2μs per message)
- **Compliance**: FIPS 140-2 approved, meets GDPR/PCI DSS/HIPAA requirements

**Security Guarantees**:

- 256-bit key security (industry standard)
- 128-bit authentication tag (prevents tampering)
- Ciphertext indistinguishability (prevents information leakage)

---

## Symmetric vs Asymmetric Encryption Analysis


### Why Symmetric (AES-256-GCM)?


**Use Case**: One sender service distributes to multiple receiver services within **trusted system boundary**

- **Sender**: Webhook router service receives GitHub webhooks, enqueues messages to different queues
- **Receivers**: Multiple bot services (Task Tactician, Merge Warden, Spec Sentinel, etc.) each consuming from their own queues
- All services are **trusted** (part of your application ecosystem)
- All services have access to shared secret store (Azure Key Vault / AWS Secrets Manager)
- Each queue can use a unique symmetric key shared between sender and its designated receiver(s)

**Performance Comparison**:

| Operation | AES-256-GCM (Symmetric) | RSA-2048 (Asymmetric) |
|-----------|--------------------------|------------------------|
| Encryption | ~1-2 microseconds | ~100-500 microseconds |
| Decryption | ~1-2 microseconds | ~1000-3000 microseconds |
| Throughput Impact | <1% | 10-50% |
| Hardware Acceleration | Yes (AES-NI) | Limited |

**Message Size**:

- **Symmetric**: No practical limit (can encrypt gigabytes)
- **Asymmetric**: RSA-2048 limited to ~190 bytes, RSA-4096 to ~450 bytes
  - Requires hybrid approach (RSA to encrypt symmetric key, AES for data)
  - What TLS/HTTPS does internally anyway

**Key Management**:

- **Symmetric**: Keys stored in shared secret manager (Azure Key Vault / AWS Secrets Manager)
  - Webhook router and bot services both retrieve keys for their respective queues
  - IAM roles/managed identities control access to specific keys
  - Simple key rotation via secret store updates
- **Asymmetric**: Public key distribution, private key protection, certificate management, PKI infrastructure
  - Only needed if services don't trust each other
  - Unnecessary complexity for trusted service ecosystem

**Industry Standard**: This is exactly what TLS/HTTPS does:

1. Use asymmetric (RSA/ECDH) for initial key exchange
2. Use symmetric (AES-GCM) for all actual data encryption
3. We skip step 1 since all services are trusted and share access to secret store

### When Would Asymmetric Be Needed?


**Scenario**: Different services with **different trust boundaries** (zero-trust architecture)

Example:

- **Service A** (third-party webhook receiver) sends to queue
- **Service B** (your event processor) receives from queue
- Service A is **untrusted** (different organization, security domain)
- Service B should **not** trust Service A to protect encryption keys
- Service A could be compromised without affecting B

**Solution**:

- Service A has Service B's **public key** (can encrypt, cannot decrypt)
- Service B has private key (can decrypt)
- Even if Service A is compromised, attacker cannot decrypt messages in queue

**For Your Use Case**: **Not needed** because:

- Webhook router and all bot services are **part of your trusted system**
- All services deployed and managed by your organization
- All services authenticate to same secret store (Azure Key Vault / AWS Secrets Manager)
- Symmetric encryption provides sufficient security within trusted boundary
- Much simpler key management and significantly better performance

### Hybrid Approach (If Needed in Future)


If you later need different senders/receivers:

```rust
// Sender: Encrypt with recipient's public key
let data_key = generate_random_symmetric_key();  // 256-bit AES key
let encrypted_data = aes_gcm_encrypt(data_key, message_body);
let encrypted_key = rsa_encrypt(recipient_public_key, data_key);

// Receiver: Decrypt with private key
let data_key = rsa_decrypt(recipient_private_key, encrypted_key);
let message_body = aes_gcm_decrypt(data_key, encrypted_data);
```

**Trade-offs**:

- ✅ Different keys for sender/receiver
-~100x slower than pure symmetric
- ❌ More complex key management
- ❌ Still uses symmetric for actual data (hybrid)

**Recommendation**: Stick with symmetric unless you have cross-service encryption requirements.

---

## Algorithm Specifications


### Default: AES-256-GCM


### 1. CryptoProvider Trait


```rust
#[async_trait]

pub trait CryptoProvider: Send + Sync {
    async fn encrypt(
        &self,
        key_id: &EncryptionKeyId,
        plaintext: &[u8],
        associated_data: &[u8],
    ) -> Result<EncryptedMessage, CryptoError>;

    async fn decrypt(
        &self,
        encrypted: &EncryptedMessage,
    ) -> Result<Vec<u8>, CryptoError>;
}
```

**Default Implementation**: `AesGcmCryptoProvider` using `aes-gcm` crate.

### 2. KeyProvider Trait


```rust
#[async_trait]

pub trait KeyProvider: Send + Sync {
    async fn get_key(&self, key_id: &EncryptionKeyId)
        -> Result<EncryptionKey, CryptoError>;
    async fn current_key_id(&self) -> Result<EncryptionKeyId, CryptoError>;
    async fn valid_key_ids(&self) -> Result<Vec<EncryptionKeyId>, CryptoError>;
}
```

**Applications Implement**: Integration with Azure Key Vault, AWS Secrets Manager, or custom key stores.

### 3. EncryptedMessage Type


```rust
pub struct EncryptedMessage {
    pub key_id: EncryptionKeyId,        // For key lookup
    pub ciphertext: Vec<u8>,            // Encrypted body
    pub nonce: Nonce,                   // 96-bit nonce
    pub auth_tag: AuthenticationTag,    // 128-bit tag
    pub encrypted_at: i64,              // Unix timestamp
    pub version: u8,                    // Format version
}
```

### 4. Message Flow


**Sending**:

1. Application creates `Message` with plaintext body
2. `QueueClient::send()` checks if crypto enabled in config
3. **If encryption enabled**:
   - Encrypts body using `CryptoProvider`
   - Prepends \"QRE1\" marker to encrypted bytes
   - Logs: `Message sent with encryption (encrypted=true)`
   - Emits metric: `queue_messages_sent{encrypted="true"}`
4. **If encryption disabled** (debug mode):
   - Sends plaintext body without marker
   - Logs WARNING: `Message sent WITHOUT encryption (encrypted=false)`
   - Emits metric: `queue_messages_sent{encrypted="false"}`
5. Sends message to queue (metadata remains cleartext)

**Receiving**:

1. `QueueClient::receive()` retrieves message from queue
2. Checks first 4 bytes for \"QRE1\" encryption marker
3. **If marker present (encrypted message)**:
   - Validates message freshness (timestamp check)
   - Decrypts body using `CryptoProvider`
   - Logs: `Message received with encryption (encrypted=true)`
   - Emits metric: `queue_messages_received{encrypted="true"}`
   - Returns plaintext to application
4. **If no marker (plaintext message)**:
   - Checks `plaintext_policy` configuration:
     - **Allow**: Logs WARNING, processes message
     - **Reject**: Returns error, rejects message
     - **AllowWithAlert**: Logs ERROR, processes message
   - Emits metric: `queue_messages_received{encrypted="false"}`
   - Returns plaintext body as-is (backward compatibility)

**Key Benefits**:

- Encryption/decryption transparent to application code
- Auto-detection enables mixed encrypted/plaintext environments
- Debug mode: Disable encryption on sender, receiver still works
- Metrics and logs enable monitoring of encryption adoption

---

## Security Properties


### Confidentiality


- Message body encrypted with AES-256 (256-bit key strength)
- Ciphertext indistinguishable from random data
- Only parties with correct key can decrypt

### Integrity


- 128-bit authentication tag prevents undetected modification
- Tag covers ciphertext + associated data (message ID, session ID, timestamp)
- Constant-time verification prevents timing attacks

### Authenticity


- Only parties with correct key can create valid encrypted messages
- Authentication tag proves message originated from legitimate sender

### Freshness


- Timestamp included in encrypted message
- Configurable maximum age (default: 5 minutes)
- Rejects messages older than threshold (replay protection)

### Replay Protection


**Timestamp-Based** (Default):

- Simple, stateless, no storage overhead
- Allows replays within freshness window (acceptable for most use cases)

**Nonce Tracking** (Opt-In):

- Tracks used nonces in cache/database
- Strongest replay protection (detects duplicate nonces)
- Opt-in for high-security scenarios

---

## Key Management


### Application Responsibilities


1. **Key Storage**: Store keys in Azure Key Vault, AWS Secrets Manager, or equivalent
2. **Key Rotation**: Rotate keys every 90 days (recommended)
3. **Access Control**: Restrict key access to authorized services
4. **Multi-Environment**: Separate keys for dev/staging/prod

### Library Responsibilities


1. **Key Protection**: Zero key material from memory on drop (using `zeroize` crate)
2. **Logging Safety**: Never log keys, redact in Debug implementations
3. **Multi-Key Support**: Support multiple active keys during rotation
4. **Async Loading**: Async key retrieval from secret stores

### Key Rotation Process


```rust
// 1. Add new key to key provider
key_provider.add_key(new_key);

// 2. Set as current (new messages use this key)
key_provider.set_current(new_key_id);

// 3. Wait for old messages to expire (queue TTL)
tokio::time::sleep(queue_ttl).await;

// 4. Remove old key
key_provider.remove_key(old_key_id);
```

**Zero Downtime**: New messages encrypt with new key, old messages still decrypt with old key.

---

## Configuration


### Crypto Configuration


```rust
pub struct CryptoConfig {
    pub enabled: bool,                   // Default: false (opt-in)
    pub plaintext_policy: PlaintextPolicy, // Default: Allow
    pub max_message_age: Duration,       // Default: 5 minutes
    pub validate_freshness: bool,        // Default: true
    pub track_nonces: bool,              // Default: false (opt-in)
    pub nonce_cache_ttl: Duration,       // Default: 10 minutes
}

pub enum PlaintextPolicy {
    Allow,            // Accept plaintext, log WARNING
    Reject,           // Reject plaintext, return error
    AllowWithAlert,   // Accept plaintext, log ERROR
}
```

### Queue Client Integration


**Production Configuration** (encryption enabled):

```rust
let client = QueueClientBuilder::new()
    .with_azure_provider(config)
    .with_crypto(CryptoConfig {
        enabled: true,
        plaintext_policy: PlaintextPolicy::AllowWithAlert, // Gradual rollout
        max_message_age: Duration::from_secs(300),
        validate_freshness: true,
        ..Default::default()
    })
    .with_key_provider(Arc::new(my_key_provider))
    .build()
    .await?;

// Encryption transparent to application
let msg = Message::new(b"sensitive data".to_vec());
client.send(msg).await?;  // Automatically encrypted with "QRE1" marker

let received = client.receive().await?;  // Automatically decrypted
println!("{}", String::from_utf8_lossy(received.body()));
```

**Debug Configuration** (encryption disabled for troubleshooting):

```rust
let client = QueueClientBuilder::new()
    .with_azure_provider(config)
    .with_crypto(CryptoConfig {
        enabled: false,  // Disable encryption for debugging
        plaintext_policy: PlaintextPolicy::Allow,
        ..Default::default()
    })
    .build()
    .await?;

// WARNING logged on every send: "Message sent WITHOUT encryption"
client.send(msg).await?;  // Sent as plaintext (no marker)

// Receiver still accepts message (plaintext policy: Allow)
let received = client.receive().await?;
// WARNING logged: "Message received WITHOUT encryption"
```

### Observability


**Metrics**:

```rust
// Counters (labeled by encryption status)
queue_messages_sent_total{queue="my-queue", encrypted="true"}
queue_messages_sent_total{queue="my-queue", encrypted="false"}

queue_messages_received_total{queue="my-queue", encrypted="true"}
queue_messages_received_total{queue="my-queue", encrypted="false"}

// Gauge (encryption configuration)
queue_crypto_enabled{queue="my-queue"} = 1.0  // Enabled
queue_crypto_enabled{queue="my-queue"} = 0.0  // Disabled

// Crypto errors
queue_crypto_errors_total{error_type="authentication_failed"}
queue_crypto_errors_total{error_type="key_not_found"}
```

**Alerting**:

```promql
# Alert if >1% of messages unencrypted in production

rate(queue_messages_received_total{encrypted="false"}[5m])
/ rate(queue_messages_received_total[5m]) > 0.01

# Alert if encryption disabled in production

queue_crypto_enabled{environment="production"} == 0
```

---

## Performance Impact


### Encryption Overhead


**AES-256-GCM Performance** (hardware-accelerated):

- Encryption: ~1-2 microseconds per message (typical webhook size)
- Throughput impact: <1% for most workloads
- Hardware acceleration: AES-NI instructions (modern CPUs)

### Optimization Strategies


1. **Batch Encryption**: Parallelize encryption of multiple messages
2. **Hardware Acceleration**: Use AES-NI CPU instructions (automatic in `aes-gcm` crate)
3. **Connection Pooling**: Reuse `CryptoProvider` instances (thread-safe)
4. **Key Caching**: Cache keys in memory to avoid repeated secret store lookups

**Recommendation**: Performance overhead negligible compared to network and queue latency.

---

## Behavioral Assertions


Key behavioral specifications (see [assertions.md](./assertions.md) for complete list):

- **Assertion 24**: Encryption round-trip preserves plaintext
- **Assertion 25**: Tampered messages detected and rejected
- **Assertion 26**: Freshness validation rejects old messages
- **Assertion 27**: Key rotation supports old and new keys simultaneously
- **Assertion 28**: Missing keys produce clear errors
- **Assertion 29**: Nonce tracking prevents replay attacks
- **Assertion 30**: Metadata remains cleartext (session ID, correlation ID)
- **Assertion 31**: Key material zeroed from memory
- **Assertion 32**: Encryption disabled by default (opt-in)
- **Assertion 33**: Algorithm versioning for future upgrades
- **Assertion 34**: Constant-time verification prevents timing attacks

---

## Migration Path


### Phase 1: Opt-In (Current Architecture)


- Crypto disabled by default (`enabled: false`)
- Applications explicitly enable with configuration
- No impact on existing deployments
- Fully backward compatible

### Phase 2: Deprecation Warning (Future)


- Log warnings when crypto disabled in production
- Update documentation recommending crypto for all deployments
- Provide migration guides

### Phase 3: Default Enable (Future)


- Crypto enabled by default (`enabled: true`)
- Applications must explicitly disable (not recommended)
- Requires key provider configuration

### Phase 4: Mandatory (Future, Breaking Change)


- Remove ability to disable crypto
- All messages encrypted (major version bump: 2.0)
- Strongest security posture

---

## Integration Examples


### Multi-Service Architecture Patterns


#### Pattern 1: One Sender, Multiple Receivers with Different Keys


**Scenario**: Webhook router sends to different queues, each receiver has its own encryption key.

```rust
// ===== SENDER SERVICE (Webhook Router) =====

// Create separate clients for each destination queue
async fn setup_sender() -> Result<Vec<QueueClient>> {
    // Client for Service A (uses key-a)
    let client_a = QueueClientBuilder::new()
        .with_azure_provider(azure_config("queue-service-a"))
        .with_crypto(CryptoConfig { enabled: true, .. })
        .with_key_provider(Arc::new(create_key_provider("key-a")))
        .build().await?;

    // Client for Service B (uses key-b)
    let client_b = QueueClientBuilder::new()
        .with_azure_provider(azure_config("queue-service-b"))
        .with_crypto(CryptoConfig { enabled: true, .. })
        .with_key_provider(Arc::new(create_key_provider("key-b")))
        .build().await?;

    // Client for Service C (uses key-c)
    let client_c = QueueClientBuilder::new()
        .with_azure_provider(azure_config("queue-service-c"))
        .with_crypto(CryptoConfig { enabled: true, .. })
        .with_key_provider(Arc::new(create_key_provider("key-c")))
        .build().await?;

    Ok(vec![client_a, client_b, client_c])
}

// Route webhook to appropriate queue
async fn route_webhook(event: WebhookEvent, clients: &[QueueClient]) {
    match event.event_type {
        "pull_request" => clients[0].send(event.into()).await?, // → Service A
        "issue" => clients[1].send(event.into()).await?,        // → Service B
        "push" => clients[2].send(event.into()).await?,         // → Service C
        _ => {}
    }
}

// ===== RECEIVER SERVICES =====

// Service A: Has key-a
async fn service_a_receiver() -> Result<QueueClient> {
    QueueClientBuilder::new()
        .with_azure_provider(azure_config("queue-service-a"))
        .with_key_provider(Arc::new(create_key_provider("key-a")))
        .build().await
}

// Service B: Has key-b
async fn service_b_receiver() -> Result<QueueClient> {
    QueueClientBuilder::new()
        .with_azure_provider(azure_config("queue-service-b"))
        .with_key_provider(Arc::new(create_key_provider("key-b")))
        .build().await
}

// Service C: Has key-c
async fn service_c_receiver() -> Result<QueueClient> {
    QueueClientBuilder::new()
        .with_azure_provider(azure_config("queue-service-c"))
        .with_key_provider(Arc::new(create_key_provider("key-c")))
        .build().await
}

// Each receiver auto-detects and decrypts with its own key
async fn process_messages(client: &QueueClient) -> Result<()> {
    loop {
        let msg = client.receive().await?;
        // Auto-decrypted if "QRE1" marker present
        // Uses key from this client's KeyProvider
        process(msg.body()).await?;
        client.complete(msg.receipt()).await?;
    }
}
```

**Key Isolation**: Each service has its own encryption key. Compromise of one key doesn't affect other services.

---

#### Pattern 2: Mixed Encryption - Some Queues Encrypted, Others Plaintext


**Scenario**: Production queues encrypted, debug/test queues plaintext.

```rust
// ===== SENDER SERVICE =====

struct QueueRouter {
    prod_client: QueueClient,      // Encrypted
    staging_client: QueueClient,   // Encrypted
    debug_client: QueueClient,     // Plaintext
}

async fn setup_router() -> Result<QueueRouter> {
    // Production: Encryption enforced
    let prod_client = QueueClientBuilder::new()
        .with_azure_provider(azure_config("prod-queue"))
        .with_crypto(CryptoConfig {
            enabled: true,
            plaintext_policy: PlaintextPolicy::Reject,  // Strict
            ..Default::default()
        })
        .with_key_provider(Arc::new(prod_key_provider))
        .build().await?;

    // Staging: Encryption enabled
    let staging_client = QueueClientBuilder::new()
        .with_azure_provider(azure_config("staging-queue"))
        .with_crypto(CryptoConfig {
            enabled: true,
            plaintext_policy: PlaintextPolicy::AllowWithAlert,
            ..Default::default()
        })
        .with_key_provider(Arc::new(staging_key_provider))
        .build().await?;

    // Debug: Encryption disabled (for troubleshooting)
    let debug_client = QueueClientBuilder::new()
        .with_azure_provider(azure_config("debug-queue"))
        .with_crypto(CryptoConfig {
            enabled: false,  // Plaintext messages
            ..Default::default()
        })
        .build().await?;

    Ok(QueueRouter { prod_client, staging_client, debug_client })
}

async fn send_event(router: &QueueRouter, env: Environment, event: Event) {
    match env {
        Environment::Production => {
            // Sends encrypted with "QRE1" marker
            router.prod_client.send(event).await?;
        }
        Environment::Staging => {
            // Sends encrypted with "QRE1" marker
            router.staging_client.send(event).await?;
        }
        Environment::Debug => {
            // Sends plaintext (no marker)
            // Logs WARNING: "Message sent WITHOUT encryption"
            router.debug_client.send(event).await?;
        }
    }
}

// ===== RECEIVER =====

// Receiver auto-detects both encrypted and plaintext
async fn receiver() -> Result<QueueClient> {
    QueueClientBuilder::new()
        .with_azure_provider(config)
        .with_key_provider(Arc::new(key_provider))
        .build().await
}

async fn process_loop(client: &QueueClient) {
    loop {
        let msg = client.receive().await?;

        // Auto-detection:
        // - If "QRE1" present → decrypts
        // - If no marker → plaintext (logs WARNING)

        println!("Received: {}", String::from_utf8_lossy(msg.body()));
        client.complete(msg.receipt()).await?;
    }
}
```

**Flexibility**: Production enforces encryption, debug allows plaintext for troubleshooting.

---

#### Pattern 3: Zero-Configuration Receiver (Auto-Detection Only)


**Scenario**: Receiver doesn't know sender's encryption status, handles both automatically.

```rust
// ===== RECEIVER SERVICE =====

// Receiver handles both encrypted and plaintext automatically
async fn universal_receiver() -> Result<QueueClient> {
    QueueClientBuilder::new()
        .with_azure_provider(config)
        .with_crypto(CryptoConfig {
            // Note: enabled not specified here, just provide key provider
            plaintext_policy: PlaintextPolicy::Allow,  // Accept both
            ..Default::default()
        })
        .with_key_provider(Arc::new(key_provider))
        .build().await
}

async fn process_messages(client: &QueueClient) {
    loop {
        let msg = client.receive().await?;

        // Library checks first 4 bytes automatically:
        // [Q][R][E][1] → encrypted, decrypt with key_provider
        // [anything else] → plaintext, log warning

        // Application just uses plaintext body
        let body = msg.body();
        process_event(body).await?;

        client.complete(msg.receipt()).await?;
    }
}

// ===== MULTIPLE SENDERS (Mixed) =====

// Sender 1: Encrypted
let sender1 = QueueClientBuilder::new()
    .with_azure_provider(config)
    .with_crypto(CryptoConfig { enabled: true, .. })
    .with_key_provider(key_provider)
    .build().await?;

sender1.send(msg).await?;  // Sends: [Q][R][E][1][encrypted_data]

// Sender 2: Plaintext (debug)
let sender2 = QueueClientBuilder::new()
    .with_azure_provider(config)
    .with_crypto(CryptoConfig { enabled: false, .. })
    .build().await?;

sender2.send(msg).await?;  // Sends: [raw_data]

// Receiver handles both correctly without configuration changes
```

**Key Benefit**: Receiver doesn't need coordination with sender. Detection is message-based, not config-based.

---

### Key Management Per Service


Each service accesses its own key from secret store:

```rust
// Service A: Key from Azure Key Vault
pub struct ServiceAKeyProvider {
    vault_client: KeyvaultClient,
}

impl KeyProvider for ServiceAKeyProvider {
    async fn get_key(&self, key_id: &EncryptionKeyId) -> Result<EncryptionKey> {
        // Loads "service-a-encryption-key" from vault
        let secret = self.vault_client
            .get_secret("service-a-encryption-key")
            .await?;
        let key_bytes = base64::decode(secret.value())?;
        Ok(EncryptionKey::from_bytes("service-a-key", &key_bytes))
    }

    async fn current_key_id(&self) -> Result<EncryptionKeyId> {
        Ok(EncryptionKeyId::new("service-a-key"))
    }
}

// Service B: Key from AWS Secrets Manager
pub struct ServiceBKeyProvider {
    secrets_client: SecretsManagerClient,
}

impl KeyProvider for ServiceBKeyProvider {
    async fn get_key(&self, key_id: &EncryptionKeyId) -> Result<EncryptionKey> {
        // Loads "service-b-encryption-key" from secrets manager
        let response = self.secrets_client
            .get_secret_value()
            .secret_id("service-b-encryption-key")
            .send()
            .await?;
        let key_bytes = response.secret_binary().unwrap();
        Ok(EncryptionKey::from_bytes("service-b-key", key_bytes.as_ref()))
    }

    async fn current_key_id(&self) -> Result<EncryptionKeyId> {
        Ok(EncryptionKeyId::new("service-b-key"))
    }
}
```

**Isolation**: Each service's IAM role/managed identity only has access to its own key in the secret store.

---

### Multi-Service Security Model


```
                    ┌──────────────────────────────────┐
                    │   Azure Key Vault / AWS Secrets  │
                    │   key-a │ key-b │ key-c          │
                    └────┬─────────┬─────────┬─────────┘
                         │         │         │
     ┌───────────────────┼─────────┼─────────┼────────────────┐
     │ Webhook Router (Sender Service)                         │
     │  ┌──────────────┐  ┌──────────────┐  ┌──────────────┐ │
     │  │ QueueClient  │  │ QueueClient  │  │ QueueClient  │ │
     │  │ + key-a      │  │ + key-b      │  │ + key-c      │ │
     │  │ (encrypts)   │  │ (encrypts)   │  │ (encrypts)   │ │
     │  └──────┬───────┘  └──────┬───────┘  └──────┬───────┘ │
     └─────────┼──────────────────┼──────────────────┼────────┘
               │                  │                  │
               │ [QRE1+data]      │ [QRE1+data]      │ [QRE1+data]
               ↓                  ↓                  ↓
          ┌─────────┐        ┌─────────┐        ┌─────────┐
          │ Queue A │        │ Queue B │        │ Queue C │
          │(PR events)       │(issues)  │        │(pushes) │
          └────┬────┘        └────┬────┘        └────┬────┘
               │                  │                  │
               ↓                  ↓                  ↓
     ┌──────────────────┐  ┌──────────────────┐  ┌──────────────────┐
     │ Task Tactician   │  │ Merge Warden     │  │ Spec Sentinel    │
     │ (Receiver)       │  │ (Receiver)       │  │ (Receiver)       │
     │ + key-a          │  │ + key-b          │  │ + key-c          │
     │ (decrypts)       │  │ (decrypts)       │  │ (decrypts)       │
     └──────────────────┘  └──────────────────┘  └──────────────────┘
               ↑                  ↑                  ↑
               └──────────────────┼──────────────────┘
                    ┌─────────────┴──────────────┐
                    │ Shared Secret Store Access │
                    │ (via IAM roles/identities) │
                    └────────────────────────────┘
```

**Security Properties**:

- **Sender** (webhook router): Receives GitHub webhooks, encrypts messages for each queue with appropriate key
- **Receivers** (bot services): Each retrieves its queue's key from shared secret store
- **Key Isolation**: Each queue has unique encryption key
- **Compromise Isolation**: Compromise of one key doesn't affect other queues
- **Trusted Boundary**: All services trust each other (same organization/deployment)
- **IAM Controls**: Access to keys controlled by Azure AD/AWS IAM roles
- **Auto-Detection**: Receivers auto-detect encrypted messages (no coordination needed)

**Why Symmetric Works Here**:

- Webhook router and bot services are all **trusted** (your services)
- All authenticate to same secret store using managed identities/IAM roles
- Key sharing is **within trusted boundary**, not across organizations
- Much simpler than asymmetric PKI infrastructure
- Better performance for high-throughput webhook processing

---

## Summary: Multi-Service Capabilities


### ✅ Supported Scenarios


1. **Multiple Keys Per Sender**:
   - ✅ Create separate `QueueClient` instances with different `KeyProvider`s
   - ✅ Each destination queue can have its own encryption key
   - ✅ Sender manages multiple clients, routes messages appropriately

2. **Mixed Encryption**:
   - ✅ Some `QueueClient` instances with `enabled: true` (encrypted)
   - ✅ Some `QueueClient` instances with `enabled: false` (plaintext)
   - ✅ Same sender can use both patterns for different queues

3. **Receiver Auto-Detection**:
   - ✅ Receiver checks first 4 bytes for "QRE1" marker
   - ✅ No configuration needed on receiver about sender's encryption status
   - ✅ Handles both encrypted and plaintext messages automatically
   - ✅ Logs and metrics distinguish encrypted vs plaintext

### 🔑 Key Design Benefits


- **Message-Based Detection**: Encryption status embedded in message (marker), not configuration
- **No Coordination Needed**: Receiver doesn't need to know sender's config
- **Per-Queue Encryption**: Each queue can have different encryption settings
- **Key Isolation**: Each service can have its own encryption key
- **Gradual Rollout**: Can enable encryption queue-by-queue, service-by-service
- **Debug Friendly**: Can disable encryption temporarily without breaking receivers

---

## Integration Examples (continued)


### With Azure Key Vault


```rust
pub struct AzureKeyVaultProvider {
    client: KeyvaultClient,
    vault_name: String,
}

#[async_trait]

impl KeyProvider for AzureKeyVaultProvider {
    async fn get_key(&self, key_id: &EncryptionKeyId)
        -> Result<EncryptionKey, CryptoError> {
        let secret = self.client.get_secret(key_id.as_str()).await?;
        let key_bytes = base64::decode(secret.value())?;
        Ok(EncryptionKey::from_bytes(key_id.as_str(), &key_bytes))
    }

    async fn current_key_id(&self) -> Result<EncryptionKeyId, CryptoError> {
        let current = self.client
            .get_secret_metadata("current-encryption-key")
            .await?;
        Ok(EncryptionKeyId::new(current))
    }
}
```

### With AWS Secrets Manager


```rust
pub struct AwsSecretsManagerProvider {
    client: SecretsManagerClient,
    secret_prefix: String,
}

#[async_trait]

impl KeyProvider for AwsSecretsManagerProvider {
    async fn get_key(&self, key_id: &EncryptionKeyId)
        -> Result<EncryptionKey, CryptoError> {
        let secret_name = format!("{}/{}", self.secret_prefix, key_id.as_str());
        let response = self.client
            .get_secret_value()
            .secret_id(secret_name)
            .send()
            .await?;
        let key_bytes = response.secret_binary().unwrap();
        Ok(EncryptionKey::from_bytes(key_id.as_str(), key_bytes.as_ref()))
    }
}
```

---

## Dependencies


### New Crate Dependencies


- **`aes-gcm`**: AES-GCM authenticated encryption (hardware-accelerated)
- **`zeroize`**: Secure memory zeroing for key material
- **`rand`**: Cryptographically secure random number generation (nonces)
- **`subtle`**: Constant-time comparison functions (timing attack prevention)

### Optional Dependencies


- **`azure-security-keyvault`**: Azure Key Vault integration (example implementation)
- **`aws-sdk-secretsmanager`**: AWS Secrets Manager integration (example implementation)

---

## Testing Strategy


### Unit Tests


- Encryption/decryption round-trip
- Tampering detection (ciphertext, auth tag, associated data)
- Freshness validation
- Key rotation scenarios
- Error handling (missing keys, expired messages, unsupported versions)

### Integration Tests


- Encrypted messages through queue (send → receive)
- Key provider integration (Azure Key Vault, AWS Secrets Manager)
- Performance benchmarks (throughput with encryption enabled)

### Contract Tests


- Crypto behavior consistent across providers (Azure, AWS, in-memory)
- Encrypted messages portable between environments

---

## Constraints Summary


From [constraints.md](./constraints.md):

- Keys MUST be zeroed from memory on drop
- Keys MUST NEVER be logged (even in debug/trace)
- Debug implementations MUST redact key material
- Use constant-time comparison for cryptographic verification
- Default to AES-256-GCM (FIPS 140-2 approved)
- Nonce generation MUST use cryptographically secure RNG
- Authentication tag verification before returning plaintext
- Support key rotation without service interruption
- Timestamp-based freshness validation configurable
- Encrypted message format includes version field

---

## Next Steps for Interface Designer


The architecture is complete. Interface designer should:

1. **Define Concrete Types**:
   - `EncryptionKeyId`, `Nonce`, `AuthenticationTag` types
   - `EncryptedMessage` struct with serialization
   - `EncryptionKey` with `zeroize` integration

2. **Create Trait Definitions**:
   - `CryptoProvider` trait with encrypt/decrypt operations
   - `KeyProvider` trait with key retrieval operations

3. **Define Error Types**:
   - `CryptoError` enum (EncryptionFailed, AuthenticationFailed, KeyNotFound, MessageExpired, UnsupportedVersion)
   - Error context for debugging

4. **Integration Points**:
   - Update `QueueClient::send()` to encrypt messages when crypto enabled
   - Update `QueueClient::receive()` to decrypt messages when crypto enabled
   - Add `CryptoConfig` to `QueueClientConfig`
   - Add `key_provider` field to `QueueClientBuilder`

5. **Generate Stubs**:
   - `src/crypto.rs`: Module with types and traits
   - `src/crypto_tests.rs`: Test file structure
   - `src/providers/aes_gcm.rs`: Default crypto provider implementation
   - Update `src/client.rs` with crypto integration points

6. **Documentation**:
   - Rustdoc for all public crypto types and traits
   - Examples showing encryption setup
   - Security considerations in module docs

---

## Summary


This architecture provides production-grade MITM protection for queue-runtime:

✅ **Confidentiality**: AES-256 encryption protects message content
✅ **Integrity**: Authentication tags prevent tampering
✅ **Authenticity**: Only parties with keys can create valid messages
✅ **Freshness**: Timestamp validation prevents replay attacks
✅ **Transparency**: Automatic encryption/decryption, no API changes
✅ **Flexibility**: Application-provided keys via KeyProvider abstraction
✅ **Performance**: Minimal overhead (<1% throughput impact)
✅ **Migration**: Opt-in, backward compatible, clear upgrade path
✅ **Compliance**: FIPS 140-2, GDPR, PCI DSS, HIPAA ready

The design follows clean architecture principles:

- Business logic (cryptography) separated from infrastructure (key storage)
- Dependency inversion (library depends on KeyProvider abstraction)
- Type safety (branded types for key IDs, nonces, tags)
- Production-ready (comprehensive error handling, testing, observability)

Architecture complete and ready for interface design phase.