docspec-http 1.5.1

HTTP API server for DocSpec document conversion
Documentation
# `docspec-http`

HTTP API server for DocSpec markdown or HTML conversion to BlockNote JSON (default), HTML, or oxa.dev JSON via `Accept`.

Send markdown (`Content-Type: text/markdown`) or HTML (`Content-Type: text/html`), receive BlockNote JSON (default), HTML (`Accept: text/html`), or oxa.dev JSON (`Accept: application/vnd.oxa+json`). The underlying DocSpec pipeline is streaming, but this v1 HTTP wrapper **buffers the request body and the conversion output in memory** before responding. End-to-end streaming over HTTP is planned for a future version. For now, request size scales with available memory.

> **HTML is paragraph-only.** The HTML reader currently parses `<p>` elements only, and the HTML writer currently emits only paragraph events. Other HTML input elements and non-paragraph output events (headings, lists, tables, formatting, etc.) are silently dropped. See [docspec-html-reader](../docspec-html-reader/README.md) and [docspec-html-writer](../docspec-html-writer/README.md).

## Quick Start

```bash
cargo build -p docspec-http --bin docspec-http --release
./target/release/docspec-http --port 3000
```

Default host is `127.0.0.1`. Default port is `3000`.

```bash
./target/release/docspec-http --host 0.0.0.0 --port 8080
```

## Endpoints

| Method  | Path        | Description                                                      |
| ------- | ----------- | ---------------------------------------------------------------- |
| POST    | /conversion | Convert markdown or HTML to BlockNote (default), HTML, or oxa.dev JSON |
| OPTIONS | /conversion | Preflight / allowed methods                                      |
| GET     | /health     | Liveness check                                                   |
| HEAD    | /health     | Liveness check (no body)                                         |
| OPTIONS | /health     | Allowed methods                                                  |

## curl Examples

```bash
# Convert markdown to BlockNote JSON (default)
curl -X POST \
     -H 'Content-Type: text/markdown' \
     --data '# Hello World' \
     http://localhost:3000/conversion

# Convert HTML to BlockNote JSON
curl -X POST \
     -H 'Content-Type: text/html' \
     --data '<p>Hello World</p>' \
     http://localhost:3000/conversion

# Convert markdown to HTML
curl -X POST \
     -H 'Content-Type: text/markdown' \
     -H 'Accept: text/html' \
     --data 'Hello World' \
     http://localhost:3000/conversion

# Convert markdown to oxa.dev JSON (opt-in via Accept)
curl -X POST \
     -H 'Content-Type: text/markdown' \
     -H 'Accept: application/vnd.oxa+json' \
     --data 'Hello World' \
     http://localhost:3000/conversion

# Convert HTML to oxa.dev JSON
curl -X POST \
     -H 'Content-Type: text/html' \
     -H 'Accept: application/vnd.oxa+json' \
     --data '<p>Hello World</p>' \
     http://localhost:3000/conversion

# Check server health
curl http://localhost:3000/health

# HEAD health check (no body in response)
curl -I http://localhost:3000/health

# OPTIONS — see allowed methods
curl -X OPTIONS -i http://localhost:3000/conversion
```

## Request / Response Headers

**`X-Request-ID`**: Generated (UUID v4) if the request omits it. Echoed back unchanged if present.

**`X-Trace-ID`**: Echoed back if present. Never generated by the server.

**`Cache-Control`**: `max-age=0, private, must-revalidate` on every response, including errors.

## Error Responses

All errors use RFC 7807 Problem Details JSON (`application/problem+json; charset=utf-8`).

| Code | Meaning                                                  |
| ---- | -------------------------------------------------------- |
| 400  | Empty body or invalid UTF-8                              |
| 404  | Unknown path                                             |
| 405  | Wrong method (response includes `Allow` header)          |
| 406  | `Accept` header excludes all supported output types      |
| 415  | `Content-Type` must be `text/markdown` or `text/html`    |
| 422  | Input parse error (malformed markdown or HTML)           |
| 500  | Internal conversion error                                |

Accepted `Accept` values for `/conversion`: `text/html` (HTML), `application/vnd.oxa+json` (oxa.dev), `application/vnd.docspec.blocknote+json`, `application/vnd.blocknote+json` (BlockNote alias), `application/*`, or `*/*`. Wildcards and missing `Accept` default to BlockNote for back-compat. Anything else returns 406.

## Deployment Notes

**TLS**: Use a reverse proxy (nginx, Caddy). The server speaks plain HTTP.

**CORS**: Use a reverse proxy. No CORS headers are added.

**Auth**: Use a reverse proxy or upstream gateway.

**Body size**: No limit. Large documents are accepted. DoS risk is accepted. Both the request body and the conversion output are held in memory for the duration of the request.

**Request timeout**: No timeout. Slow clients can hang a connection indefinitely.

## Logging

Logs go to stderr at INFO level in pretty format. There are no flags to change the log level or format.

## Observability

`docspec-http` integrates with [Sentry](https://sentry.io/) for error reporting.
Activation is fully opt-in via environment variables — the binary has zero Sentry
overhead when no DSN is configured.

### Activation

Set ONE of the following to enable Sentry:

- `DOCSPEC_SENTRY_DSN` — docspec-specific override (preferred)
- `SENTRY_DSN` — Sentry's standard convention (fallback)

If both are set, `DOCSPEC_SENTRY_DSN` wins. An empty string or malformed DSN is
treated as "not set" — the server starts normally and logs a warning to stderr.

### Configuration (all optional)

These follow Sentry's standard conventions:

- `SENTRY_ENVIRONMENT` — environment name (default: `production`)
- `SENTRY_RELEASE` — release identifier (default: auto, `docspec-http@<version>`)
- `SENTRY_SAMPLE_RATE` — error sample rate `[0.0, 1.0]` (default: `1.0`)
- `SENTRY_TRACES_SAMPLE_RATE` — performance trace sample rate `[0.0, 1.0]` (default: `0.0`, traces disabled)

### What is captured

| Signal                                                  | Captured?                               |
| ------------------------------------------------------- | --------------------------------------- |
| `500 Internal Server Error` (`HttpError::Internal`)     | yes (event)                             |
| `422 Unprocessable Entity` (`HttpError::Unprocessable`) | yes (event)                             |
| Other 4xx responses                                     | no                                      |
| Panics                                                  | yes (event)                             |
| `tracing::error!` calls                                 | yes (event)                             |
| `tracing::warn!` calls                                  | yes (breadcrumb)                        |
| `tracing::info!`/`debug!` calls                         | yes (breadcrumb)                        |
| Performance transactions                                | only if `SENTRY_TRACES_SAMPLE_RATE > 0` |

### Privacy

`docspec-http` does NOT send the following to Sentry:

- Request bodies (markdown or HTML documents)
- Response bodies (BlockNote JSON, HTML, or oxa.dev JSON)
- PII (Sentry default: `send_default_pii = false`)
- DSN values (never logged or echoed)

Sentry's default header redaction (Authorization, Cookie, etc.) is preserved.

Each captured event is tagged with `request_id` (UUID v4) and `trace_id`
(`X-Trace-ID` header value, if present) for correlation with logs.

## Wire Contract

Mirrors `github.com/docspecio/api` v3.0.2 where feasible: same endpoint path, RFC 7807 errors, `X-Request-ID`/`X-Trace-ID` header handling. Diverges in supported conversions.

## Graceful Shutdown

The server handles SIGINT and SIGTERM. In-flight requests complete before the process exits.

## Docker

### Build

```bash
DOCKER_BUILDKIT=1 docker build \
  --build-arg IMAGE_VERSION=0.1.0 \
  --build-arg IMAGE_REVISION=$(git rev-parse HEAD) \
  -t docspec-http:local .
```

Supply `IMAGE_VERSION` and `IMAGE_REVISION` at build time to populate the OCI labels. Both default to `0.1.0` and `unknown` if omitted.

### Run

```bash
docker run --rm -p 3000:3000 ghcr.io/docspec/api:0.1.0
```

The default `CMD` passes `--host 0.0.0.0 --port 3000`. Override it entirely to change the bind address or port:

```bash
docker run --rm -p 8080:8080 ghcr.io/docspec/api:0.1.0 --host 0.0.0.0 --port 8080
```

### Healthcheck

The image ships a built-in `HEALTHCHECK` that probes `GET http://127.0.0.1:3000/health` every 30 seconds using busybox `wget --spider`. Docker reports the container status in `docker ps` and Compose surfaces it via `healthcheck:`.

**The probe port is hardcoded to `3000` inside the image.** If you override `CMD` to bind a different `--port`, the built-in healthcheck will keep probing 3000 and report the container as `unhealthy` even though the server is fine. To run on a non-default port, either:

- Keep the in-container port at `3000` and only remap the host port (`-p 8080:3000`), **or**
- Override the healthcheck at runtime, e.g. `docker run --health-cmd='wget --no-verbose --tries=1 --spider http://127.0.0.1:8080/health || exit 1' …`, **or**
- Disable it with `docker run --no-healthcheck …` and rely on an external probe.

Kubernetes users should configure a Pod-level `httpGet` liveness probe on `/health` port 3000 instead of relying on the Docker `HEALTHCHECK`.

### Image tags

Images are published to `ghcr.io/docspec/api` by the release workflow (managed by release-please). The following tags are maintained:

| Tag      | Meaning                      |
| -------- | ---------------------------- |
| `0.1.0`  | Exact version                |
| `0.1`    | Latest patch of 0.1          |
| `0`      | Latest minor of 0            |
| `latest` | Most recent released version |

`latest` follows the most recent GitHub release, not the `main` branch. The publish workflow is documented contract; it is not implemented in this repository.

### Architecture

The image is built for `linux/amd64` only. No multi-platform manifest is published.

### User

The container runs as non-root UID/GID `10001` (user `docspec`). No capabilities are required.

### Reverse proxy

TLS termination, CORS headers, authentication, and rate limiting are intentionally absent from the binary. Place a reverse proxy (nginx, Caddy, etc.) in front of the container for these concerns. See [Deployment Notes](#deployment-notes) for details.

## Metrics

`docspec-http` exposes a Prometheus metrics endpoint on the same port as the main API.

**Endpoint**: `GET /metrics`

**Format**: Prometheus exposition format 0.0.4 (`text/plain; version=0.0.4; charset=utf-8`)

**Auth**: None. The endpoint is internal-only. See [Security](#security) below.

### Metric Catalog

| Name                                        | Type      | Labels                                                         | Description                                                                            | Buckets                                                                   |
| ------------------------------------------- | --------- | -------------------------------------------------------------- | -------------------------------------------------------------------------------------- | ------------------------------------------------------------------------- |
| `docspec_http_requests_total`               | counter   | `method`, `path`, `status`                                     | Total HTTP requests received                                                           | —                                                                         |
| `docspec_http_request_duration_seconds`     | histogram | `method`, `path`, `status`                                     | HTTP request latency in seconds                                                        | 0.005, 0.01, 0.025, 0.05, 0.1, 0.25, 0.5, 1.0, 2.5, 5.0, 10.0             |
| `docspec_http_request_body_bytes`           | histogram | `input_mime_type`                                              | HTTP request body size in bytes, labeled by input MIME type                            | 100, 200, 400, 800, 1600, 3200, 6400, 12800, 25600, 51200, 102400, 204800 |
| `docspec_conversions_total`                 | counter   | `result`, `error_class`, `input_mime_type`, `output_mime_type` | Total document conversions, labeled by result, error class, and input/output MIME type | —                                                                         |
| `docspec_conversion_duration_seconds`       | histogram | `result`, `input_mime_type`, `output_mime_type`                | Document conversion duration in seconds, labeled by result and input/output MIME type  | 0.005, 0.01, 0.025, 0.05, 0.1, 0.25, 0.5, 1.0, 2.5, 5.0, 10.0             |
| **`docspec_conversion_output_bytes`** (NEW) | histogram | `input_mime_type`, `output_mime_type`                          | Document conversion output size in bytes (success only)                                | 100, 200, 400, 800, 1600, 3200, 6400, 12800, 25600, 51200, 102400, 204800 |

### Label Values

**`result`**: `success`, `client_error`, `server_error`

**`error_class`**: `body_not_utf8`, `empty_body`, `internal`, `method_not_allowed`, `not_acceptable`, `not_found`, `unprocessable`, `unsupported_media_type`, `none` (only when `result=success`)

**`input_mime_type`**: `text/markdown` (the request's Content-Type matched the markdown reader), `text/html` (the request's Content-Type matched the HTML reader), `unsupported` (Content-Type header present but not a supported input format), `none` (Content-Type header absent).

**`output_mime_type`**: `application/vnd.docspec.blocknote+json` (conversion succeeded; output produced by the BlockNote writer), `text/html` (conversion succeeded; output produced by the HTML writer), `application/vnd.oxa+json` (conversion succeeded; output produced by the oxa.dev writer), `none` (no output produced — any error path).

**`path`**: matched route template (`/conversion`, `/health`) or `unknown` for fallback handlers

**`status`**: numeric HTTP status code as a string (e.g., `"200"`, `"422"`)

**`method`**: HTTP method as a string (e.g., `"GET"`, `"POST"`)

### Cardinality Guarantees

`path` is bounded to `{"/conversion", "/health", "unknown"}`. `error_class` is bounded to 9 values. `result` is bounded to 3 values. Per-request identifiers (`X-Request-ID`, `X-Trace-ID`) are never used as labels. `input_mime_type` is bounded to 4 values (`text/markdown`, `text/html`, `unsupported`, `none`). `output_mime_type` is bounded to 4 values (`application/vnd.docspec.blocknote+json`, `text/html`, `application/vnd.oxa+json`, `none`). Both come from a fixed set of `&'static str` constants in the source — never from raw header values.

### Scrape Model

Each pod maintains its own in-memory metrics. Prometheus scrapes each pod independently. No inter-pod communication is required. Aggregate across pods using PromQL.

Upkeep runs every 5 seconds, keeping histogram internal state bounded.

The `/metrics` route is mounted outside the API middleware stack, so it does not include the global `Cache-Control` header used by API responses.

The body-size histogram (`docspec_http_request_body_bytes`) only records bodies that passed Content-Type and Accept validation. Rejected requests are not counted.

The output-bytes histogram (`docspec_conversion_output_bytes`) only records observations for successful conversions. Failed conversions do not produce output, so no observation is recorded.

### Example PromQL Queries

Per-pod request rate:

```promql
rate(docspec_http_requests_total[5m])
```

Aggregate p99 latency across all pods:

```promql
histogram_quantile(0.99, sum by (le) (rate(docspec_http_request_duration_seconds_bucket[5m])))
```

Error rate broken down by error class:

```promql
rate(docspec_conversions_total{result!="success"}[5m])
```

Body-size p95:

```promql
histogram_quantile(0.95, sum by (le) (rate(docspec_http_request_body_bytes_bucket[5m])))
```

Body-size p95 by input format:

```promql
histogram_quantile(0.95, sum by (le, input_mime_type) (rate(docspec_http_request_body_bytes_bucket[5m])))
```

Conversion success rate by input format:

```promql
sum by (input_mime_type) (rate(docspec_conversions_total{result="success"}[5m]))
  / sum by (input_mime_type) (rate(docspec_conversions_total[5m]))
```

### Security

`/metrics` has no authentication. It's intended for internal scraping only. Deploy behind a private overlay network or a Kubernetes `NetworkPolicy` that restricts access to your Prometheus pods.