# HTTP Performance Benchmarks
Benchmark results comparing before and after memory allocation optimizations.
**Status**: Baseline measurements completed. Optimizations pending implementation.
## Summary
| `field_name_lowercase` (known headers) | **~97% faster** |
| `header_to_string` | **28-66% faster** |
| `header_list_to_string` | **49-72% faster** |
| `message_as_bytes` | **74-86% faster** |
| `latin1_to_string` (1KB) | **14% faster** |
| `header_list_add` | Baseline measured, optimizations pending |
| `request_builder_new` | Baseline measured, optimizations pending |
| `request_builder_build` | Baseline measured, optimizations pending |
| `body_part_as_bytes` | Baseline measured, optimizations pending |
| `multipart_as_bytes` | Baseline measured, optimizations pending |
| `header_list_parsing` | Baseline measured, optimizations pending |
## Detailed Results
### field_name_lowercase
Optimized to return `Cow<'static, str>` instead of `String`, avoiding allocations for known header types.
| content_type | 52.44 ns | 2.13 ns | **-96.77%** |
| content_length | 51.51 ns | 1.74 ns | **-96.63%** |
| generic | 63.16 ns | 77.58 ns | +16.61% (still allocates for unknown headers) |
### header_to_string
Optimized `Display` impl to write directly to formatter instead of collect-join pattern.
| content_type | 966.19 ns | 518.82 ns | **-40.76%** |
| content_encoding | 760.08 ns | 260.34 ns | **-66.20%** |
| generic | 285.35 ns | 204.65 ns | **-28.29%** |
### header_list_to_string
Optimized `Display` impl to avoid intermediate `Vec<String>` allocation.
| small (3 headers) | 1.84 µs | 904.21 ns | **-49.30%** |
| medium (10 headers) | 3.96 µs | 1.08 µs | **-71.66%** |
### message_as_bytes
Avoided cloning entire header list when adding Content-Length header.
| small_message | 1.84 µs | 477.59 ns | **-74.00%** |
| medium_message | 3.86 µs | 556.16 ns | **-85.93%** |
| large_10kb_message | 2.78 µs | 573.28 ns | **-79.05%** |
| medium_no_content_length | 2.27 µs | 535.28 ns | **-75.33%** |
### latin1_to_string
Added `with_capacity` pre-allocation.
| small (13 bytes) | 78.52 ns | 81.55 ns | +14.37% |
| medium (1KB) | 1.71 µs | 1.45 µs | **-14.20%** |
| large (10KB) | 14.80 µs | 14.68 µs | -1.40% (no significant change) |
### header_list_add
Measures the overhead of adding headers to `HttpHeaderList`, including lowercase conversion and HashMap insertion. This benchmark identifies allocation hotspots when building header lists.
**Performance Issues Identified:**
- Generic headers require `to_lowercase()` allocation on every add
- `field_name_lowercase()` returns `Cow::Owned` for generics, causing heap allocation
- HashMap key insertion copies the string
| known_header | 177.80 ns | *To be measured* | *TBD* |
| generic_header | 282.25 ns | *To be measured* | *TBD* |
| multiple_headers | 1.7025 µs | *To be measured* | *TBD* |
### request_builder_new
Measures `HttpRequestBuilder::new()` construction overhead, including encoding string building and header merging.
**Performance Issues Identified:**
- Encoding string is rebuilt on every builder creation
- Header merging involves cloning all default headers
- No caching of common request configurations
| empty_headers | 1.7376 µs | *To be measured* | *TBD* |
| with_headers | 2.7404 µs | *To be measured* | *TBD* |
### request_builder_build
Measures `HttpRequestBuilder::build_request()` overhead, including header merging and cloning.
**Performance Issues Identified:**
- All default headers are cloned for every request
- Header merging creates new HashMap entries
- No reuse of pre-rendered request components
| empty_headers_no_body | 1.9629 µs | *To be measured* | *TBD* |
| with_headers_no_body | 2.5499 µs | *To be measured* | *TBD* |
| with_headers_small_body | 2.5705 µs | *To be measured* | *TBD* |
| with_headers_medium_body | 2.4708 µs | *To be measured* | *TBD* |
### body_part_as_bytes
Measures `HttpBodyPart::as_bytes()` serialization overhead.
**Performance Issues Identified:**
- Headers are converted to String via `Display`, then to bytes
- Each part builds its own `Vec<u8>` buffer
- No buffer reuse between parts
| no_headers_small | 60.188 ns | *To be measured* | *TBD* |
| with_headers_small | 648.29 ns | *To be measured* | *TBD* |
| large_content | 648.37 ns | *To be measured* | *TBD* |
### multipart_as_bytes
Measures `HttpMultipartBody::as_bytes()` serialization overhead.
**Performance Issues Identified:**
- Each part's `as_bytes()` creates a separate buffer
- All part buffers are then copied into the final multipart buffer
- For large multipart uploads, this doubles memory usage
| small_2_parts | 1.0409 µs | *To be measured* | *TBD* |
| medium_10_parts | 4.6164 µs | *To be measured* | *TBD* |
| large_5_parts_10kb | 3.0610 µs | *To be measured* | *TBD* |
### header_list_parsing
Measures overhead of parsing multiple headers in sequence, simulating real-world header parsing.
**Performance Issues Identified:**
- Each header requires `from_str()` which lowercases the key
- `HttpHeaderList::add()` lowercases again via `field_name_lowercase()`
- Generic headers allocate strings twice (parsing + adding)
| parse_3_headers | 1.6460 µs | *To be measured* | *TBD* |
| parse_10_headers | 6.9384 µs | *To be measured* | *TBD* |
## Performance Issues Identified (To Be Optimized)
### Header Storage & Parsing
1. **Double lowercase conversion** - Headers are lowercased in `from_str()` and again in `field_name_lowercase()`
2. **Generic header allocations** - Unknown headers allocate `String` on every `field_name_lowercase()` call
3. **HashMap key copying** - Every header add copies the lowercase key string into the HashMap
### Request Construction
4. **Encoding string rebuilding** - `HttpRequestBuilder::new()` rebuilds Accept-Encoding string for every builder
5. **Header cloning** - `build_request()` clones all default headers for every request
6. **No caching** - Common request configurations are rebuilt repeatedly
### Body Serialization
7. **Double buffering in multipart** - Each part builds its own buffer, then all are copied into final buffer
8. **Header string conversion** - Headers converted to String then to bytes in `HttpBodyPart::as_bytes()`
9. **No buffer reuse** - Serialization creates new buffers instead of reusing scratch space
### Message Serialization
10. **Boxed intermediate** - `HttpMessage::as_bytes()` creates `Box<[u8]>` even when caller could stream directly
11. **Header string materialization** - Headers are built into String, then copied to bytes
12. **Body cloning** - Body bytes are cloned even when they could be borrowed
### Connection & Pooling
13. **Full message logging** - `read_response()` logs entire message, triggering full serialization
14. **Pool sweep overhead** - `clear_stale()` scans entire HashMap on every `get()` call
15. **No connection reuse optimization** - Connections removed and reinserted instead of LRU
## Optimizations Applied
1. **`HttpHeader::field_name_lowercase()`** - Returns `Cow<'static, str>` instead of `String` for known headers, eliminating allocation.
2. **`HttpHeader::Display`** - Writes directly to formatter instead of collecting into `Vec<String>` then joining.
3. **`HttpHeaderList::Display`** - Writes headers directly to formatter instead of creating intermediate `Vec<String>`.
4. **`HttpMessage::as_bytes()`** - Builds header string inline instead of cloning the entire header list.
5. **`read_chunked()`** - Pre-allocates response buffer with initial capacity and reads directly into the final buffer.
6. **`latin1_to_string()`** - Pre-allocates string with worst-case UTF-8 capacity.
7. **`HttpMultipartBody::as_bytes()`** - Estimates and pre-allocates capacity based on content sizes.
8. **`HttpBodyPart::as_bytes()`** - Pre-allocates capacity based on headers and content size.
9. **`HttpRequestBuilder::new()`** - Builds encoding string without intermediate `Vec` allocation.
## Proposed Optimizations (Not Yet Implemented)
1. **Case-insensitive header keys** - Use wrapper type to avoid repeated lowercase conversions
2. **Cache encoding strings** - Store Accept-Encoding string in static/Arc to avoid rebuilding
3. **Streaming serialization** - Add `write_to()` methods that write directly to `impl Write` instead of `Box<[u8]>`
4. **Header buffer reuse** - Cache serialized header bytes on HttpMessage
5. **Incremental pool cleanup** - Only check staleness of requested connection, not all connections
6. **Lazy message logging** - Only serialize message for logging if log level requires it
7. **Direct multipart writing** - Stream parts directly into final buffer instead of double buffering
## Benchmark Environment
- Tool: Criterion.rs 0.5
- Profile: Release (optimized)
- Baseline: `before_optimization`
- Comparison: `after_optimization`
## Running Benchmarks
```bash
# Run all benchmarks
cargo bench --bench http_bench
# Save baseline before optimizations
cargo bench --bench http_bench -- --save-baseline before_optimization
# Compare against baseline after making changes
cargo bench --bench http_bench -- --baseline before_optimization
# Save new baseline after optimizations
cargo bench --bench http_bench -- --save-baseline after_optimization
# View detailed HTML report (generated in target/criterion/)
open target/criterion/report/index.html
```
## Benchmark Coverage
The benchmark suite now covers all major performance-critical paths:
- **Header parsing** - Individual header parsing from strings
- **Header field name lookup** - Lowercase conversion overhead
- **Header serialization** - Converting headers to strings
- **Header list operations** - Adding headers, serializing lists
- **Message serialization** - Converting HTTP messages to bytes
- **Request building** - Constructing request builders and building requests
- **Body part serialization** - Converting body parts to bytes
- **Multipart serialization** - Converting multipart bodies to bytes
- **String conversion** - Latin-1 to UTF-8 conversion
## Interpreting Results
When benchmarks are run, results will show:
- **Time per operation** - Lower is better
- **Change percentage** - Negative means faster (improvement)
- **Statistical significance** - p-value indicates confidence in results
**Baseline measurements completed** - The "Before" columns in the tables above contain baseline measurements from `before_optimization` baseline.
To update this document after implementing optimizations:
1. Make optimizations to the code
2. Run `cargo bench --bench http_bench -- --baseline before_optimization`
3. Extract timing results from the output or HTML report
4. Update the "After" and "Change" columns in the tables above