httpio 0.2.4 - Docs.rs

# HTTP Performance Benchmarks

Benchmark results comparing before and after memory allocation optimizations.

**Status**: Baseline measurements completed. Optimizations pending implementation.

## Summary

| Benchmark Category | Improvement |
|-------------------|-------------|
| `field_name_lowercase` (known headers) | **~97% faster** |
| `header_to_string` | **28-66% faster** |
| `header_list_to_string` | **49-72% faster** |
| `message_as_bytes` | **74-86% faster** |
| `latin1_to_string` (1KB) | **14% faster** |
| `header_list_add` | Baseline measured, optimizations pending |
| `request_builder_new` | Baseline measured, optimizations pending |
| `request_builder_build` | Baseline measured, optimizations pending |
| `body_part_as_bytes` | Baseline measured, optimizations pending |
| `multipart_as_bytes` | Baseline measured, optimizations pending |
| `header_list_parsing` | Baseline measured, optimizations pending |

## Detailed Results

### field_name_lowercase

Optimized to return `Cow<'static, str>` instead of `String`, avoiding allocations for known header types.

| Test Case | Before | After | Change |
|-----------|--------|-------|--------|
| content_type | 52.44 ns | 2.13 ns | **-96.77%** |
| content_length | 51.51 ns | 1.74 ns | **-96.63%** |
| generic | 63.16 ns | 77.58 ns | +16.61% (still allocates for unknown headers) |

### header_to_string

Optimized `Display` impl to write directly to formatter instead of collect-join pattern.

| Test Case | Before | After | Change |
|-----------|--------|-------|--------|
| content_type | 966.19 ns | 518.82 ns | **-40.76%** |
| content_encoding | 760.08 ns | 260.34 ns | **-66.20%** |
| generic | 285.35 ns | 204.65 ns | **-28.29%** |

### header_list_to_string

Optimized `Display` impl to avoid intermediate `Vec<String>` allocation.

| Test Case | Before | After | Change |
|-----------|--------|-------|--------|
| small (3 headers) | 1.84 µs | 904.21 ns | **-49.30%** |
| medium (10 headers) | 3.96 µs | 1.08 µs | **-71.66%** |

### message_as_bytes

Avoided cloning entire header list when adding Content-Length header.

| Test Case | Before | After | Change |
|-----------|--------|-------|--------|
| small_message | 1.84 µs | 477.59 ns | **-74.00%** |
| medium_message | 3.86 µs | 556.16 ns | **-85.93%** |
| large_10kb_message | 2.78 µs | 573.28 ns | **-79.05%** |
| medium_no_content_length | 2.27 µs | 535.28 ns | **-75.33%** |

### latin1_to_string

Added `with_capacity` pre-allocation.

| Test Case | Before | After | Change |
|-----------|--------|-------|--------|
| small (13 bytes) | 78.52 ns | 81.55 ns | +14.37% |
| medium (1KB) | 1.71 µs | 1.45 µs | **-14.20%** |
| large (10KB) | 14.80 µs | 14.68 µs | -1.40% (no significant change) |

### header_list_add

Measures the overhead of adding headers to `HttpHeaderList`, including lowercase conversion and HashMap insertion. This benchmark identifies allocation hotspots when building header lists.

**Performance Issues Identified:**
- Generic headers require `to_lowercase()` allocation on every add
- `field_name_lowercase()` returns `Cow::Owned` for generics, causing heap allocation
- HashMap key insertion copies the string

| Test Case | Before | After | Change |
|-----------|--------|-------|--------|
| known_header | 177.80 ns | *To be measured* | *TBD* |
| generic_header | 282.25 ns | *To be measured* | *TBD* |
| multiple_headers | 1.7025 µs | *To be measured* | *TBD* |

### request_builder_new

Measures `HttpRequestBuilder::new()` construction overhead, including encoding string building and header merging.

**Performance Issues Identified:**
- Encoding string is rebuilt on every builder creation
- Header merging involves cloning all default headers
- No caching of common request configurations

| Test Case | Before | After | Change |
|-----------|--------|-------|--------|
| empty_headers | 1.7376 µs | *To be measured* | *TBD* |
| with_headers | 2.7404 µs | *To be measured* | *TBD* |

### request_builder_build

Measures `HttpRequestBuilder::build_request()` overhead, including header merging and cloning.

**Performance Issues Identified:**
- All default headers are cloned for every request
- Header merging creates new HashMap entries
- No reuse of pre-rendered request components

| Test Case | Before | After | Change |
|-----------|--------|-------|--------|
| empty_headers_no_body | 1.9629 µs | *To be measured* | *TBD* |
| with_headers_no_body | 2.5499 µs | *To be measured* | *TBD* |
| with_headers_small_body | 2.5705 µs | *To be measured* | *TBD* |
| with_headers_medium_body | 2.4708 µs | *To be measured* | *TBD* |

### body_part_as_bytes

Measures `HttpBodyPart::as_bytes()` serialization overhead.

**Performance Issues Identified:**
- Headers are converted to String via `Display`, then to bytes
- Each part builds its own `Vec<u8>` buffer
- No buffer reuse between parts

| Test Case | Before | After | Change |
|-----------|--------|-------|--------|
| no_headers_small | 60.188 ns | *To be measured* | *TBD* |
| with_headers_small | 648.29 ns | *To be measured* | *TBD* |
| large_content | 648.37 ns | *To be measured* | *TBD* |

### multipart_as_bytes

Measures `HttpMultipartBody::as_bytes()` serialization overhead.

**Performance Issues Identified:**
- Each part's `as_bytes()` creates a separate buffer
- All part buffers are then copied into the final multipart buffer
- For large multipart uploads, this doubles memory usage

| Test Case | Before | After | Change |
|-----------|--------|-------|--------|
| small_2_parts | 1.0409 µs | *To be measured* | *TBD* |
| medium_10_parts | 4.6164 µs | *To be measured* | *TBD* |
| large_5_parts_10kb | 3.0610 µs | *To be measured* | *TBD* |

### header_list_parsing

Measures overhead of parsing multiple headers in sequence, simulating real-world header parsing.

**Performance Issues Identified:**
- Each header requires `from_str()` which lowercases the key
- `HttpHeaderList::add()` lowercases again via `field_name_lowercase()`
- Generic headers allocate strings twice (parsing + adding)

| Test Case | Before | After | Change |
|-----------|--------|-------|--------|
| parse_3_headers | 1.6460 µs | *To be measured* | *TBD* |
| parse_10_headers | 6.9384 µs | *To be measured* | *TBD* |

## Performance Issues Identified (To Be Optimized)

### Header Storage & Parsing
1. **Double lowercase conversion** - Headers are lowercased in `from_str()` and again in `field_name_lowercase()`
2. **Generic header allocations** - Unknown headers allocate `String` on every `field_name_lowercase()` call
3. **HashMap key copying** - Every header add copies the lowercase key string into the HashMap

### Request Construction
4. **Encoding string rebuilding** - `HttpRequestBuilder::new()` rebuilds Accept-Encoding string for every builder
5. **Header cloning** - `build_request()` clones all default headers for every request
6. **No caching** - Common request configurations are rebuilt repeatedly

### Body Serialization
7. **Double buffering in multipart** - Each part builds its own buffer, then all are copied into final buffer
8. **Header string conversion** - Headers converted to String then to bytes in `HttpBodyPart::as_bytes()`
9. **No buffer reuse** - Serialization creates new buffers instead of reusing scratch space

### Message Serialization
10. **Boxed intermediate** - `HttpMessage::as_bytes()` creates `Box<[u8]>` even when caller could stream directly
11. **Header string materialization** - Headers are built into String, then copied to bytes
12. **Body cloning** - Body bytes are cloned even when they could be borrowed

### Connection & Pooling
13. **Full message logging** - `read_response()` logs entire message, triggering full serialization
14. **Pool sweep overhead** - `clear_stale()` scans entire HashMap on every `get()` call
15. **No connection reuse optimization** - Connections removed and reinserted instead of LRU

## Optimizations Applied

1. **`HttpHeader::field_name_lowercase()`** - Returns `Cow<'static, str>` instead of `String` for known headers, eliminating allocation.

2. **`HttpHeader::Display`** - Writes directly to formatter instead of collecting into `Vec<String>` then joining.

3. **`HttpHeaderList::Display`** - Writes headers directly to formatter instead of creating intermediate `Vec<String>`.

4. **`HttpMessage::as_bytes()`** - Builds header string inline instead of cloning the entire header list.

5. **`read_chunked()`** - Pre-allocates response buffer with initial capacity and reads directly into the final buffer.

6. **`latin1_to_string()`** - Pre-allocates string with worst-case UTF-8 capacity.

7. **`HttpMultipartBody::as_bytes()`** - Estimates and pre-allocates capacity based on content sizes.

8. **`HttpBodyPart::as_bytes()`** - Pre-allocates capacity based on headers and content size.

9. **`HttpRequestBuilder::new()`** - Builds encoding string without intermediate `Vec` allocation.

## Proposed Optimizations (Not Yet Implemented)

1. **Case-insensitive header keys** - Use wrapper type to avoid repeated lowercase conversions
2. **Cache encoding strings** - Store Accept-Encoding string in static/Arc to avoid rebuilding
3. **Streaming serialization** - Add `write_to()` methods that write directly to `impl Write` instead of `Box<[u8]>`
4. **Header buffer reuse** - Cache serialized header bytes on HttpMessage
5. **Incremental pool cleanup** - Only check staleness of requested connection, not all connections
6. **Lazy message logging** - Only serialize message for logging if log level requires it
7. **Direct multipart writing** - Stream parts directly into final buffer instead of double buffering

## Benchmark Environment

- Tool: Criterion.rs 0.5
- Profile: Release (optimized)
- Baseline: `before_optimization`
- Comparison: `after_optimization`

## Running Benchmarks

```bash
# Run all benchmarks
cargo bench --bench http_bench

# Save baseline before optimizations
cargo bench --bench http_bench -- --save-baseline before_optimization

# Compare against baseline after making changes
cargo bench --bench http_bench -- --baseline before_optimization

# Save new baseline after optimizations
cargo bench --bench http_bench -- --save-baseline after_optimization

# View detailed HTML report (generated in target/criterion/)
open target/criterion/report/index.html
```

## Benchmark Coverage

The benchmark suite now covers all major performance-critical paths:

- **Header parsing** - Individual header parsing from strings
- **Header field name lookup** - Lowercase conversion overhead
- **Header serialization** - Converting headers to strings
- **Header list operations** - Adding headers, serializing lists
- **Message serialization** - Converting HTTP messages to bytes
- **Request building** - Constructing request builders and building requests
- **Body part serialization** - Converting body parts to bytes
- **Multipart serialization** - Converting multipart bodies to bytes
- **String conversion** - Latin-1 to UTF-8 conversion

## Interpreting Results

When benchmarks are run, results will show:
- **Time per operation** - Lower is better
- **Change percentage** - Negative means faster (improvement)
- **Statistical significance** - p-value indicates confidence in results

**Baseline measurements completed** - The "Before" columns in the tables above contain baseline measurements from `before_optimization` baseline.

To update this document after implementing optimizations:
1. Make optimizations to the code
2. Run `cargo bench --bench http_bench -- --baseline before_optimization`
3. Extract timing results from the output or HTML report
4. Update the "After" and "Change" columns in the tables above