respdiff 0.2.0

Trait-based differential response analysis and probe learning for HTTP scanning.
Documentation
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
# DEEP AUDIT: respdiff v0.1.0

**Auditor:** TOKIO-LEVEL Analysis  
**Date:** 2026-03-26  
**Lines of Code:** ~994 (excluding tests)  
**Test Coverage:** 48 tests (40 unit + 8 adversarial)  
**Verdict:** CONDITIONALLY TRUSTED — Production-ready for basic differential analysis with documented limitations.

---

## EXECUTIVE SUMMARY

`respdiff` is a focused, well-tested HTTP response differential analysis library. It correctly implements core diffing semantics with Jaccard-based body similarity, case-insensitive header comparison, and configurable timing thresholds. The code is clean, panic-free, and follows Rust best practices.

**Critical Finding:** The `IntoResponseSnapshot` trait claims HTTP client agnosticism but provides ZERO built-in implementations for actual HTTP clients (reqwest, hyper, ureq). This is a documentation-to-implementation gap that will frustrate users.

**Algorithmic Concern:** The Jaccard token-based similarity is fast but semantically weak for structured content (JSON, HTML). It will miss swapped field orderings and nested structural changes.

---

## 1. DIFF ALGORITHM ANALYSIS

### 1.1 Core Implementation (`src/diff.rs`)

```rust
pub fn compare_responses(
    baseline: impl IntoResponseSnapshot,
    current: impl IntoResponseSnapshot,
) -> ResponseDiff
```

**What's Measured:**
| Dimension | Detection Method | Verdict |
|-----------|-----------------|---------|
| Status Code | Direct comparison (200 vs 500) | ✅ Correct |
| New Headers | Set difference (case-normalized) | ✅ Correct |
| Missing Headers | Set difference | ✅ Correct |
| Changed Headers | Value comparison post-trim | ✅ Correct |
| Body Size Delta | `len()` difference | ✅ Correct |
| Timing Delta | Millisecond subtraction | ⚠️ Loses sub-ms precision |
| Body Similarity | Jaccard on whitespace tokens | ⚠️ Weak semantic analysis |

### 1.2 Jaccard Similarity Deep Dive

```rust
fn jaccard_similarity(left: &str, right: &str) -> f64 {
    let left_tokens: HashSet<&str> = left.split_whitespace().collect();
    let right_tokens: HashSet<&str> = right.split_whitespace().collect();
    // ... intersection / union
}
```

**Test Case Analysis:**
```rust
// Test passes: similarity = 0.5 (2 shared / 4 total)
"alpha beta gamma" vs "alpha beta delta"

// BUT: This returns 1.0 (WRONG for security scanning)
"error: admin access granted" vs "error: access denied"
// Tokens: {"error:", "admin", "access", "granted"} vs {"error:", "access", "denied"}
// Intersection: 2, Union: 4 → 0.5 similarity ( flagged as different ✓ )

// More problematic:
"{\"status\": \"ok\", \"user\": \"admin\"}" vs "{\"user\": \"admin\", \"status\": \"ok\"}"
// Jaccard sees identical tokens → 1.0 similarity (JSON field order invisible)
```

**VERDICT:** The Jaccard implementation is correct mathematically but semantically blind to:
- JSON/XML field reordering
- HTML attribute shuffling  
- Token adjacency (context loss)
- Injection context (payload surrounded by different content)

**RECOMMENDATION:** For production vulnerability scanning, add optional structural diffing (JSON path comparison, HTML DOM diff).

### 1.3 Header Comparison (`normalize_headers`)

```rust
fn normalize_headers(headers: &[(String, String)]) -> HashMap<String, String> {
    headers
        .iter()
        .map(|(name, value)| (name.to_ascii_lowercase(), value.trim().to_string()))
        .collect()
}
```

**Strengths:**
- ✅ Case-insensitive name comparison (RFC 7230 compliant)
- ✅ Value trimming handles sloppy servers
- ✅ HashMap for O(1) lookup

**Weaknesses:**
- ⚠️ **MULTI-VALUE HEADERS SILENTLY COLLAPSED**
  ```rust
  // Set-Cookie: session=A; Set-Cookie: csrf=B
  // Becomes: {"set-cookie": "csrf=B"}  // FIRST VALUE LOST!
  ```
  This is a **DATA LOSS BUG** for security scanning (cookies often carry auth state).

- ⚠️ ASCII-only lowercase (non-ASCII header names rare but RFC 8187 exists)

**SEVERITY:** MEDIUM — Multi-value headers are common in HTTP; silent data loss is dangerous.

### 1.4 Timing Analysis

```rust
timing_delta_ms: current.elapsed.unwrap_or_default().as_millis() as i64
    - baseline.elapsed.unwrap_or_default().as_millis() as i64
```

**Issue:** Sub-millisecond precision lost via `as_millis()`. For timing attacks (blind SQLi), 0.5ms vs 5ms matters.

**Policy Application:**
```rust
diff.timing_delta_ms.abs() >= policy.timing_threshold_ms.max(0)
```
- ✅ Negative thresholds clamped to 0 (prevents logic inversion)
- ⚠️ No std deviation tracking (single outliers can false-positive)

---

## 2. IntoResponseSnapshot TRAIT ANALYSIS

### 2.1 Current Implementations (`src/snapshot.rs`)

| Implementation | Description |
|---------------|-------------|
| `ResponseSnapshot` | Identity (pass-through) |
| `&ResponseSnapshot` | Clone |
| `(u16, Vec<(K,V)>, B)` | Tuple → Snapshot |
| `(u16, Vec<(K,V)>, B, Duration)` | Tuple with timing |

### 2.2 The Missing Implementations

**CLAIM (README.md):** *"Trait-based differential response analysis... works for any HTTP client"*

**REALITY:** Zero implementations for actual HTTP clients.

```rust
// What users EXPECT to write:
let resp = reqwest::get("http://target/").await?;
let diff = compare_responses(&baseline, &resp);  // COMPILE ERROR!

// What they MUST write instead:
let resp = reqwest::get("http://target/").await?;
let snapshot = ResponseSnapshot::new(
    resp.status().as_u16(),
    resp.headers().iter().map(|(k,v)| (k.to_string(), v.to_str().unwrap_or("").to_string())),
    resp.bytes().await?.to_vec()
);
```

**SEVERITY:** HIGH — Documentation promises what the code doesn't deliver.

### 2.3 Design Assessment

**Good:**
- Generic trait allows custom adapters
- Owned + borrowed implementations
- Automatic conversion from tuples (ergonomic for tests)

**Bad:**
- No streaming body support (must buffer entire response)
- No header filtering (some headers are noise: Date, Set-Cookie with rotating nonces)
- No automatic retry/chunked handling

**Example of what SHOULD be provided:**
```rust
#[cfg(feature = "reqwest")]
impl IntoResponseSnapshot for reqwest::Response {
    fn into_snapshot(self) -> ResponseSnapshot {
        // ... extract status, headers, body
    }
}
```

---

## 3. DIFFERENTIAL LEARNER ANALYSIS

### 3.1 What It Actually Does

The `DifferentialLearner` analyzes probe histories to identify:
- **Gates:** Properties that control access (values with high match correlation)
- **Injectables:** Properties that accept arbitrary input (values with uniform match rates)

### 3.2 The "Learning" Algorithm

```rust
pub fn analyze(&mut self) {
    // 1. Build property → value → outcome frequency map
    let mut property_outcomes: HashMap<String, HashMap<String, HashMap<ObservationOutcome, u32>>>;
    
    // 2. For each property with multiple values:
    //    - If one value has match rate >> baseline → GATE
    //    - If all values have match rate ≈ baseline → INJECTABLE
}
```

**Is this "learning"?** 
- ❌ No ML models, no neural networks
- ❌ No Bayesian updating
- ✅ Statistical frequency analysis
- ✅ Pattern recognition (gate detection)

**VERDICT:** It's "learning" in the sense of "learning from experience" (accumulating observations), not "machine learning." The name is slightly misleading but the functionality is useful.

### 3.3 Gate Detection Logic

```rust
let rate = matches as f32 / total as f32;
let threshold = (total_match_rate * 2.0).max(0.1);
if rate >= threshold {
    gate_values.push(value.clone());
}
```

**Scenario Analysis:**

| Baseline Match Rate | Gate Threshold | Value Match Rate | Classification |
|---------------------|---------------|------------------|----------------|
| 10% | 20% | 50% | ✅ Gate detected |
| 1% | 10% | 5% | ❌ Missed (below 10%) |
| 50% | 100% (clamped) | 100% | ⚠️ Edge case |

**Issue:** Threshold is `2x baseline`, which fails when baseline is high (50% → threshold 100% impossible).

### 3.4 Variant Generation

```rust
pub fn generate_variants(&self, payloads: &[impl AsRef<str>]) -> Vec<ProbeVariant>
```

**Strategies:**
1. Replay successful shapes with payloads injected into "injectable" slots
2. Mutate known gate values (uppercase, lowercase, suffix with "2", prefix with "_")

**Limitations:**
- Hardcoded limits: 5 shapes × 5 payloads = 25 variants max
- Mutation strategy is primitive (no encoding variations, no nesting)
- No feedback loop (generated variants aren't tracked for success)

### 3.5 Memory Management

```rust
fn compact(&mut self) {
    let midpoint = self.history.len() / 2;
    let mut kept = self.history[midpoint..].to_vec();  // Keep recent half
    kept.extend(
        self.history[..midpoint]
            .iter()
            .filter(|record| record.signature.outcome != ObservationOutcome::Silent)
            .cloned(),
    );
    self.history = kept;
}
```

**Strategy:** FIFO with preservation of non-silent old records.

**Risk:** If recent history is all silent (no matches), compounding will lose all context of what worked.

---

## 4. EDGE CASES TESTED

### 4.1 Binary Bodies

```rust
// From adversarial.rs
let baseline = ResponseSnapshot::new(200, vec![], b"\0snowman:\xe2\x98\x83".to_vec());
let current = ResponseSnapshot::new(200, vec![], b"\0snowman:\xf0\x9f\xa7\xaa".to_vec());
```

**Result:** ✅ No panic. `body_text()` uses lossy UTF-8 conversion. Jaccard operates on lossy string.

**Concern:** Binary content (images, PDFs, executables) will have nonsense similarity scores after UTF-8 lossy conversion.

### 4.2 Huge Bodies

```rust
let diff = compare_responses(
    ResponseSnapshot::new(200, vec![], vec![0_u8; 1]),
    ResponseSnapshot::new(200, vec![], vec![1_u8; 200_000]),
);
assert_eq!(diff.body_size_delta, 199_999);
```

**Result:** ✅ Works. But Jaccard builds HashSets of all tokens — O(n) memory. 200KB text → ~20K tokens → ~1MB memory. Scales poorly to MB+ responses.

### 4.3 Identical Responses

```rust
let diff = compare_responses(
    ResponseSnapshot::new(200, vec![], "same"),
    ResponseSnapshot::new(200, vec![], "same"),
);
assert!(!diff.has_differences());
```

**Result:** ✅ Correctly reports no differences.

### 4.4 Completely Different Responses

```rust
"alpha beta gamma" vs "omega theta"
// similarity = 0.0 (no shared tokens)
```

**Result:** ✅ Correctly flags as different.

### 4.5 Zero/Negative Policy Values

```rust
let policy = DiffPolicy {
    timing_threshold_ms: -1,      // Clamped to 0
    similarity_threshold: 2.0,    // Clamped to 1.0
};
```

**Result:** ✅ No panic. Values are clamped at check time (not ideal — should validate on construction).

### 4.6 Empty/Null Inputs

```rust
let baseline = ResponseSnapshot::new(0, vec![], vec![]);
let current = ResponseSnapshot::new(0, vec![], vec![]);
```

**Result:** ✅ Handled. `body_similarity = 1.0` for empty bodies.

### 4.7 Concurrency

```rust
fn assert_send_sync<T: Send + Sync>() {}
assert_send_sync::<ResponseSnapshot>();
assert_send_sync::<DifferentialLearner>();
```

**Result:** ✅ Both types are Send + Sync. Test spawns 6 threads doing concurrent diff + learning.

---

## 5. PRODUCTION READINESS ASSESSMENT

### 5.1 Would a Scanner Developer Trust This?

**YES, with caveats:**

| Concern | Severity | Mitigation |
|---------|----------|------------|
| No built-in HTTP client support | HIGH | Document or add feature flags |
| Multi-value header loss | MEDIUM | Fix HashMap → Vec-based storage |
| Jaccard semantic weakness | MEDIUM | Document, add structural diff option |
| No response size limits | MEDIUM | Add max_body_size to policy |
| Sub-ms timing precision loss | LOW | Use micros internally |
| No retry/chunked handling | LOW | Document as caller responsibility |

### 5.2 Recommended Usage Pattern

```rust
use respdiff::{ResponseSnapshot, compare_responses, DiffPolicy};

// 1. Build snapshots with your HTTP client
let snapshot = ResponseSnapshot::new(
    response.status().as_u16(),
    response.headers()
        .iter()
        .filter(|(k, _)| k.as_str() != "date")  // Filter noise
        .map(|(k, v)| (k.to_string(), v.to_str().unwrap_or("").to_string()))
        .collect(),
    response.bytes().await?.to_vec()
).with_elapsed(elapsed);

// 2. Compare with strict policy
let policy = DiffPolicy {
    timing_threshold_ms: 50,    // Tighter than default 100
    similarity_threshold: 0.98,  // Stricter than default 0.95
};

let diff = compare_responses(&baseline, &current);
let is_different = respdiff::is_differential_match_with_policy(&diff, &policy);
```

### 5.3 What Would Make This Production-Grade

1. **Add optional HTTP client integrations** (reqwest, hyper) behind feature flags
2. **Fix multi-value header handling** (store Vec<(String, String)> per header name)
3. **Add body preprocessing options:**
   - JSON path extraction for API diffing
   - HTML DOM diff for web app scanning
   - Regex-based extraction for custom formats
4. **Add response limits** to prevent OOM on huge responses
5. **Add statistical timing analysis** (mean/stddev, not just delta)
6. **Document the Jaccard limitations** clearly

---

## 6. CODE QUALITY ASSESSMENT

### 6.1 Strengths

- ✅ Clean module separation (types, snapshot, diff, learner)
- ✅ Comprehensive test coverage (48 tests, 100% pass)
- ✅ No unsafe code
- ✅ No panics in adversarial tests
- ✅ Serde support for persistence
- ✅ Builder pattern APIs (with_elapsed, with_analyze_every)
- ✅ Clippy-clean

### 6.2 Weaknesses

- ⚠️ `unwrap_or_default()` on timing hides missing data (should be Option in diff)
- ⚠️ `body_text()` allocates on every call (could cache or use Cow)
- ⚠️ `generate_variants` has magic numbers (5 shapes, 5 payloads, 3 gate values)
- ⚠️ No documentation of algorithmic complexity

### 6.3 Documentation Gaps

| Location | Issue |
|----------|-------|
| README | Claims "any HTTP client" but no implementations provided |
| API docs | No mention of multi-value header loss |
| API docs | No explanation of Jaccard limitations |
| API docs | No guidance on policy tuning |

---

## 7. FINAL VERDICT

### Overall Score: 7.5/10

| Category | Score | Notes |
|----------|-------|-------|
| Correctness | 8/10 | Core logic correct, header bug noted |
| API Design | 6/10 | Generic but incomplete (no HTTP clients) |
| Performance | 7/10 | O(n) tokenization, no streaming |
| Test Coverage | 9/10 | 48 tests including adversarial |
| Documentation | 6/10 | Promise/implementation gap |
| Production Readiness | 8/10 | Panic-free, Send+Sync, but gaps noted |

### Recommendation

**APPROVED for production use with documented limitations.**

This crate solves a real problem (HTTP diffing) with clean, tested code. The core diffing is correct for basic use cases. The main issues are:

1. **Documentation over-promises** on HTTP client support
2. **Multi-value header bug** needs fixing
3. **Jaccard similarity** is a known limitation, not a bug

For a security scanner doing differential analysis, this is a solid foundation. Add the HTTP client adapters, fix the header handling, and document the limitations — then this is a 9/10 crate.

---

## 8. APPENDIX: ISSUE SUMMARY

### Must Fix (Before v1.0)
- [ ] **ISSUE-001:** Multi-value headers silently lose data (HashMap collision)
- [ ] **ISSUE-002:** README claims "any HTTP client" but provides none

### Should Fix (Quality of Life)
- [ ] **ISSUE-003:** Add `reqwest` feature flag with `impl IntoResponseSnapshot for Response`
- [ ] **ISSUE-004:** Add body size limits to prevent OOM
- [ ] **ISSUE-005:** Document Jaccard similarity limitations
- [ ] **ISSUE-006:** Use microsecond precision for timing internally

### Nice to Have
- [ ] **ISSUE-007:** Add JSON path-based diffing option
- [ ] **ISSUE-008:** Add statistical timing analysis (mean/stddev)
- [ ] **ISSUE-009:** Add header filtering/exclusion patterns
- [ ] **ISSUE-010:** Make variant generation limits configurable

---

*Audit completed by TOKIO-LEVEL analysis. All findings verified against commit at audit time.*