wireframe 0.3.0

# `wireframe` 1.0: A Guide to Observability, Operability, and Maturity

## 1. Introduction: Beyond Functional Correctness

A library is functionally correct when it performs its specified tasks
according to its API. A library is *mature* when it anticipates the realities
of its operational environment. For a low-level networking framework like
`wireframe`, this means acknowledging that production systems are complex,
failures are inevitable, and visibility into runtime behaviour is
non-negotiable.

This guide details the cross-cutting strategies that will elevate `wireframe`
from a simple toolkit to a production-grade, operationally mature framework. It
is a practical blueprint for implementers, covering three essential domains:

1. **Pervasive Observability:** Making the library's internal state transparent
   and debuggable by default.

2. **Operability by Design:** Building features that make the library easy to
   run, manage, and recover gracefully.

3. **API and Design Maturity:** Adopting patterns that signal a commitment to
   developer ergonomics, extensibility, and long-term maintainability.

By embedding these principles into the core of the library, we provide users
with the tools they need to build, deploy, and maintain robust network services
with confidence.

## 2. Pervasive Observability with `tracing`

In a complex asynchronous application, `println!` is not debugging; it is
whispering into a hurricane. To understand runtime behaviour, we must have
structured, context-aware diagnostics. The `tracing` crate is the de facto
standard for this in the async Rust ecosystem, and its integration is a
first-class feature of `wireframe` 1.0.

Before any work begins, install a global `tracing` subscriber:

```rust
tracing_subscriber::fmt::init();
```

Without a subscriber, `tracing` events from `wireframe` will be dropped.

### 2.1 The `tracing` Span Hierarchy

`tracing`'s key innovation is the `span`, which represents a unit of work with
a distinct beginning and end. Nesting spans creates a causal chain of events,
enabling an understanding of how work flows through the system, even across
asynchronous boundaries and threads. `wireframe` will adopt a standard span
hierarchy.

- `connection`**:** A root span for each TCP connection, created on `accept()`.
  It provides the top-level context for all activity on that connection.

- `request`**:** A child of `connection`, created when an inbound frame is
  decoded and routed to a handler.

- `response_stream`**:** A child of `request`, created when a handler returns a
  multi-frame response.

**Implementation Example:** Instrumenting the connection actor.

```rust
use tracing::{info, info_span, Instrument};
use std::net::SocketAddr;

async fn run_connection_actor(
    connection_id: u64,
    peer_addr: SocketAddr,
    // ... other arguments
) {
    let conn_span = info_span!(
        "connection",
        id = connection_id,
        peer = %peer_addr,
    );

    // The .instrument() call attaches the span to the future.
    // All events inside this block will be associated with the connection.
    async move {
        info!("New connection established.");
        // ... actor logic ...
        info!("Connection closed.");
    }.instrument(conn_span).await;
}

```

**Measurable Objective:** Every spawned task and connection actor in the
library must be instrumented with a `tracing` span containing relevant,
queryable identifiers (e.g., `connection_id`, `peer_addr`).

### 2.2 Structured Lifecycle Events

Within each span, we will emit structured events at critical points in the
connection lifecycle. Using structured key-value pairs (`field = value`)
instead of simple strings allows logs to be automatically parsed, queried, and
aggregated by modern logging platforms.

**Implementation Example:** Logging key events.

```rust
// Inside the connection actor's read loop
let frame = read_frame().await?;
tracing::debug!(
    frame_type = %frame.type_name(),
    frame_len = frame.len(),
    "Frame received from client"
);

// Inside the PushHandle's drop-if-full logic
tracing::warn!(
    frame_type = %frame.type_name(),
    "Push queue full, dropping frame."
);

// When a response stream is fully processed
tracing::debug!(
    reason = "clean_completion",
    "Response stream closed."
);

```

### 2.3 From Logs to Metrics

The true power of structured `tracing` data is that it can be consumed by a
`Subscriber` that aggregates events into quantitative metrics. This enables the
creation of vital operational dashboards for monitoring the health and
performance of `wireframe`-based services.

The library will document and support the generation of the following key
metrics, suitable for export to systems like Prometheus or OpenTelemetry.

| Metric Name                       | Type      | Description                                                                                            |
| --------------------------------- | --------- | ------------------------------------------------------------------------------------------------------ |
| wireframe_connections_active      | Gauge     | The current number of active connections.                                                              |
| wireframe_frames_processed_total  | Counter   | Total frames processed, with labels for direction (inbound/outbound) and protocol_type.                |
| wireframe_push_queue_depth        | Histogram | A distribution of the push queue depth, sampled periodically. Essential for diagnosing back-pressure.  |
| wireframe_pushes_dropped_total    | Counter   | The number of frames dropped due to a full push queue. This should be zero in a healthy system.        |
| wireframe_request_latency_seconds | Histogram | A distribution of handler processing times, from frame receipt to response completion.                 |
| wireframe_reassembly_errors_total | Counter   | The number of times message reassembly failed due to timeouts, oversized messages, or sequence errors. |

**Measurable Objective:** A `wireframe` application, when configured with a
suitable `tracing-subscriber`, must emit all the metrics listed above, allowing
for the construction of a comprehensive operational health dashboard.

## 3. Operability by Design

Operability is about building systems that are predictable and controllable.
`wireframe` will incorporate several key features designed to make it a
well-behaved citizen in a production environment.

### 3.1 Coordinated, Graceful Shutdown

A network service that cannot be stopped cleanly is a liability. `wireframe`
will implement a canonical, proactive shutdown pattern to ensure orderly
termination.

**Implementation:** The core mechanism relies on
`tokio_util::sync::CancellationToken` for signalling and
`tokio_util::task::TaskTracker` for synchronisation.

```rust
// Inside the main server accept loop
let tracker = TaskTracker::new();
let shutdown_token = CancellationToken::new();

// In the signal handler for SIGINT/SIGTERM
shutdown_token.cancel();
tracker.close();
tracker.wait().await; // Wait for all tasks to finish

// When spawning a connection
let token_clone = shutdown_token.clone();
tracker.spawn(run_connection_actor(..., token_clone));

// Inside the connection actor's select! loop
tokio::select! {
    biased;
    _ = shutdown_token.cancelled() => {
        tracing::info!("Shutdown signal received, closing connection gracefully.");
        // Perform cleanup (e.g., send final protocol message)
        break;
    },
    // ... other branches ...
}

```

**Measurable Objective:** When a server shutdown is initiated via the
cancellation token, 100% of active connection tasks must terminate cleanly
within a configurable timeout (e.g., 5 seconds), and the main server process
must exit with a success code.

### 3.2 Intelligent Error Handling with Typed Errors

Conflating unrecoverable transport errors with recoverable protocol errors
makes robust error handling impossible. `wireframe` will provide a generic
error enum to give developers the necessary information to react intelligently.

**Implementation:**

```rust
pub enum WireframeError<E> {
    /// A fatal I/O error occurred (e.g., socket closed).
    /// The connection should be terminated.
    Io(std::io::Error),

    /// A protocol-specific logical error (e.g., invalid query).
    /// The framework may be able to format a reply before
    /// closing the logical stream.
    Protocol(E),
}

```

When a handler's response stream yields a `WireframeError::Protocol(e)`, the
connection actor can pass this typed error to a protocol-specific callback.
This allows the implementation to serialize a proper error frame (e.g., an SQL
error code) and send it to the client before terminating the current operation,
rather than just abruptly closing the connection.

### 3.3 Resilient Messaging with Dead Letter Queues (DLQ)

For applications where dropping a pushed message is not an option (e.g.,
critical audit logs), `wireframe` will support an optional Dead Letter Queue.

**Implementation:** The `WireframeApp` builder will accept an `mpsc::Sender`
for a DLQ. If a push fails because the primary queue is full, the frame is
routed to this sender instead of being dropped.

```rust
// Inside PushHandle, when try_send fails with a full queue
match self.tx.try_send(frame) {
    Ok(_) => Ok(()),
    Err(mpsc::error::TrySendError::Full(failed_frame)) => {
        if let Some(dlq_tx) = &self.dlq_tx {
            if dlq_tx.try_send(failed_frame).is_err() {
                tracing::error!("Primary push queue and DLQ are both full. Frame lost.");
            }
        } else {
            tracing::warn!("Push queue full, dropping frame.");
        }
        Ok(())
    },
    // ... handle closed channel error ...
}

```

A separate consumer task, managed by the application developer, is then
responsible for processing messages from the DLQ's receiver.

**Measurable Objective:** When the DLQ is enabled, the rate of permanently lost
pushed frames must be zero, provided the DLQ consumer can keep up. The
`wireframe_pushes_dropped_total` metric must remain at zero.

## 4. Achieving Technical Maturity

Technical maturity is reflected in an API that is not just powerful, but also
ergonomic, extensible, and idiomatic.

### 4.1 An Ergonomic, Trait-Based API

To avoid a sprawling collection of configuration closures, protocol-specific
logic will be encapsulated within a single, cohesive `WireframeProtocol` trait.
This promotes better organisation, reusability, and makes the framework easier
to extend.

**Implementation:**

```rust
pub trait WireframeProtocol: Send + Sync + 'static {
    // Associated types for frames and errors
    type Frame: FrameLike;
    type ProtocolError;

    /// Called to handle a request frame.
    async fn handle_request(
        &self,
        req: Request<Self::Frame>,
        ctx: &ConnectionContext,
    ) -> Result<Response<Self::Frame, Self::ProtocolError>, WireframeError<Self::ProtocolError>>;

    /// Called just before a frame is written to the socket.
    fn before_send(&self, frame: &mut Self::Frame, ctx: &mut ConnectionContext);

    // ... other callbacks like on_connection_setup, on_command_end ...
}

// Configuration becomes clean and declarative.
WireframeApp::new().with_protocol(MySqlProtocolImpl);

```

This design is more maintainable and aligns with Rust's powerful trait-based
abstraction model.

### 4.2 A Commitment to Quality Assurance

A mature library demonstrates its commitment to quality through a rigorous and
multi-faceted testing strategy that goes beyond simple unit tests.
`wireframe`'s QA process will be a core part of its development.

- **Stateful Property Testing (**`proptest`**):** For verifying complex,
  stateful protocol conversations, ensuring that thousands of random-but-valid
  sequences of operations do not lead to an invalid state.

- **Concurrency Verification (**`loom`**):** For proving the absence of data
  races and deadlocks in concurrency hotspots, `loom` will be used to perform
  permutation testing, providing a level of assurance that traditional tests
  cannot.

- **Performance Benchmarking (**`criterion`**):** To quantify the performance
  impact of new features and prevent regressions, a comprehensive suite of
  micro and macro benchmarks will be maintained.

For more detailed information and comprehensive, worked examples of these
testing methodologies, please refer to the companion document, *Wireframe: A
Multi-Layered Testing Strategy*.

By integrating these strategies, `wireframe` will not only be rich in features
but will stand as a benchmark for robust, observable, and operationally mature
systems engineering in the Rust ecosystem.