grpc_graphql_gateway 1.2.4

# Request Collapsing

Request collapsing (also known as request deduplication) is a powerful optimization that reduces the number of gRPC backend calls by identifying and coalescing identical concurrent requests.

## How It Works

When a GraphQL query contains multiple fields that call the same gRPC method with identical arguments, request collapsing ensures only one gRPC call is made:

```graphql
query {
  user1: getUser(id: "1") { name }
  user2: getUser(id: "2") { name }
  user3: getUser(id: "1") { name }  # Duplicate of user1!
}
```

**Without Request Collapsing:** 3 gRPC calls are made.

**With Request Collapsing:** Only 2 gRPC calls are made (user1 and user3 share the same response).

### The Leader-Follower Pattern

1. **Leader**: The first request with a unique key executes the gRPC call
2. **Followers**: Subsequent identical requests wait for the leader's result
3. **Broadcast**: When the leader completes, it broadcasts the result to all followers
4. **Cleanup**: The in-flight entry is removed after broadcasting

## Configuration

```rust
use grpc_graphql_gateway::{Gateway, RequestCollapsingConfig};
use std::time::Duration;

let gateway = Gateway::builder()
    .with_descriptor_set_bytes(DESCRIPTORS)
    .with_request_collapsing(RequestCollapsingConfig::default())
    .add_grpc_client("service", client)
    .build()?;
```

### Configuration Options

| Option | Default | Description |
|--------|---------|-------------|
| `coalesce_window` | 50ms | Maximum time to wait for in-flight requests |
| `max_waiters` | 100 | Maximum followers waiting for a single leader |
| `enabled` | true | Enable/disable collapsing |
| `max_cache_size` | 10000 | Maximum in-flight requests to track |

### Builder Pattern

```rust
let config = RequestCollapsingConfig::new()
    .coalesce_window(Duration::from_millis(100))  // Longer window
    .max_waiters(200)                              // More waiters allowed
    .max_cache_size(20000)                         // Larger cache
    .enabled(true);
```

## Presets

Request collapsing comes with several presets for common scenarios:

### Default (Balanced)

```rust
let config = RequestCollapsingConfig::default();
// coalesce_window: 50ms
// max_waiters: 100
// max_cache_size: 10000
```

Best for most workloads with a balance between latency and deduplication.

### High Throughput

```rust
let config = RequestCollapsingConfig::high_throughput();
// coalesce_window: 100ms
// max_waiters: 500
// max_cache_size: 50000
```

Best for high-traffic scenarios where maximizing deduplication is more important than latency.

### Low Latency

```rust
let config = RequestCollapsingConfig::low_latency();
// coalesce_window: 10ms
// max_waiters: 50
// max_cache_size: 5000
```

Best for latency-sensitive applications where quick responses are critical.

### Disabled

```rust
let config = RequestCollapsingConfig::disabled();
```

Completely disables request collapsing.

## Monitoring

You can monitor request collapsing effectiveness using the built-in statistics:

```rust
// Get the registry from ServeMux
if let Some(registry) = mux.request_collapsing() {
    let stats = registry.stats();
    println!("In-flight requests: {}", stats.in_flight_count);
    println!("Max cache size: {}", stats.max_cache_size);
    println!("Enabled: {}", stats.enabled);
}
```

## Request Key Generation

Each request is identified by a SHA-256 hash of:

1. **Service name** - The gRPC service identifier
2. **gRPC path** - The method path (e.g., `/greeter.Greeter/SayHello`)
3. **Request bytes** - The serialized protobuf message

This ensures that only truly identical requests are collapsed.

## Relationship with Other Features

### Response Caching

Request collapsing and response caching work together:

- **Request Collapsing**: Deduplicates *concurrent* identical requests
- **Response Caching**: Caches *completed* responses for future requests

The typical flow is:
1. Check response cache → cache hit? Return cached response
2. Check in-flight requests → follower? Wait for leader
3. Execute gRPC call as leader
4. Broadcast result to followers
5. Cache response for future requests

### Circuit Breaker

Request collapsing works seamlessly with the circuit breaker:

- If the circuit is open, all collapsed requests fail fast together
- The leader request respects circuit breaker state
- Followers receive the same error as the leader

## Best Practices

1. **Start with defaults**: The default configuration works well for most use cases

2. **Monitor collapse ratio**: Track how many requests are being deduplicated
   - Low ratio? Requests may be too unique, consider if collapsing adds value
   - High ratio? Great! You're saving significant backend load

3. **Tune for your workload**:
   - High read traffic? Use `high_throughput()` preset
   - Real-time requirements? Use `low_latency()` preset

4. **Consider request patterns**:
   - GraphQL queries with aliases benefit most
   - Unique requests per field won't see much benefit

## Example: Full Configuration

```rust
use grpc_graphql_gateway::{Gateway, RequestCollapsingConfig, CacheConfig};
use std::time::Duration;

let gateway = Gateway::builder()
    .with_descriptor_set_bytes(DESCRIPTORS)
    // Enable response caching
    .with_response_cache(CacheConfig {
        max_size: 10_000,
        default_ttl: Duration::from_secs(60),
        stale_while_revalidate: Some(Duration::from_secs(30)),
        invalidate_on_mutation: true,
    })
    // Enable request collapsing
    .with_request_collapsing(
        RequestCollapsingConfig::new()
            .coalesce_window(Duration::from_millis(75))
            .max_waiters(150)
    )
    .add_grpc_client("service", client)
    .build()?;
```