rust-job-queue-api-worker-system 0.1.0

A production-shaped Rust job queue: Axum API + async workers + Postgres SKIP LOCKED dequeue, retries with decorrelated jitter, idempotency, cooperative cancellation, OpenAPI, Prometheus metrics.
# rust-job-queue-api-worker-system

[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/infinityabundance/Rust-Job-Queue-API-Worker-System/blob/main/notebooks/rust_job_queue_api_worker_system_colab.ipynb)

Copyright 2026 infinityabundance

A compact, production-shaped Rust job queue: Axum HTTP API, Tokio workers, Postgres `SELECT ... FOR UPDATE SKIP LOCKED` dequeue, retries with decorrelated jitter, idempotency, cooperative cancellation, OpenAPI, Prometheus metrics, Docker Compose, and reproducible tests/benchmarks.

Single Cargo package, two feature flags, two service binaries, and a one-shot migration runner.

## What this demonstrates

| Area | Evidence in this repo |
|---|---|
| Concurrent worker correctness | A 200-job / 8-worker integration test validates no duplicate processing in that bounded SKIP LOCKED scenario, using a separate transactional `processed_log` table. |
| API design | Axum routes, typed DTOs, idempotency-key handling, status filters, OpenAPI JSON, and Swagger UI. |
| Reliability semantics | Retry scheduling, stale-lock recovery, cooperative cancellation, DB constraints, and documented trade-offs. |
| Operations | Docker Compose, migration binary, structured tracing, API metrics, worker metrics, health check, and runbook. |
| Release hygiene | Feature-gating probes, MSRV check, docs build with warnings denied, RustSec audit, and `cargo publish --dry-run` in CI. |

## Evidence

- `cargo test --all-features --locked`: 46 tests in the current suite (12 lib unit tests, 33 integration tests, 1 doctest). Integration tests require Docker because they use testcontainers Postgres.
- CI workflow: [`ci.yml`]https://github.com/infinityabundance/Rust-Job-Queue-API-Worker-System/blob/main/.github/workflows/ci.yml.
- Architecture notes: [docs/architecture.md]docs/architecture.md.
- API reference: [docs/api.md]docs/api.md.
- Operational runbook: [docs/runbook.md]docs/runbook.md.
- Trade-offs and omissions: [docs/tradeoffs.md]docs/tradeoffs.md.
- Colab smoke/repro notebook: [notebooks/rust_job_queue_api_worker_system_colab.ipynb]notebooks/rust_job_queue_api_worker_system_colab.ipynb.

## Quickstart

```bash
docker compose up                       # boots postgres + migrations + api + worker
open http://localhost:8080/docs         # Swagger UI
curl http://localhost:8080/metrics      # API Prometheus metrics
curl http://localhost:9091/metrics      # worker Prometheus metrics
cargo test --all-features --locked      # full test suite; needs Docker
```

Environment variables are documented in [`.env.example`](https://github.com/infinityabundance/Rust-Job-Queue-API-Worker-System/blob/main/.env.example).

## Endpoints

| Method | Path | Port | Notes |
|---|---|---|---|
| POST | `/jobs` | 8080 | Enqueue. Supports `Idempotency-Key` header. |
| GET | `/jobs/{id}` | 8080 | Fetch one. |
| GET | `/jobs` | 8080 | Paginated list, filterable by `status` and `kind`. |
| POST | `/jobs/{id}/cancel` | 8080 | 200 cancelled, 202 running and pending, 409 terminal. |
| GET | `/health` | 8080 | DB-level liveness. |
| GET | `/metrics` | 8080 | API Prometheus metrics. |
| GET | `/metrics` | 9091 | Worker Prometheus metrics on the worker process. |
| GET | `/docs` | 8080 | Swagger UI. |
| GET | `/api-docs/openapi.json` | 8080 | OpenAPI 3 spec. |

Full endpoint reference: [docs/api.md](docs/api.md).

## Architecture in one paragraph

The core modules (`domain`, `payload`, `db`, `queue`, `retry`, `ids`, `error`) are always compiled; `api` and `worker` are gated behind features of the same names. The API and worker binaries use `required-features`, so they cannot be built without their dependency sets. A third binary, `job-queue-migrate`, runs embedded migrations and exits; Docker Compose uses it before starting the API and worker.

The queue lives in one Postgres table (`jobs`) with a partial index on `(run_at)` filtered by `status IN ('queued','retrying')`. Retries are scheduled by setting `run_at` into the future; the same dequeue query picks them up when `run_at <= now()`. Cancellation is atomic for queued/retrying jobs and cooperative for running jobs.

Full design, including state and sequence diagrams: [docs/architecture.md](docs/architecture.md).

## Feature flags

| Feature set | Default | Pulls in |
|---|---|---|
| core | always | Domain types, payload validation, queue SQL, retry math, migrations, `utoipa` schema derives. |
| `api` | yes | `axum`, `tower`, `tower-http`, `utoipa-swagger-ui`, `metrics-exporter-prometheus`. |
| `worker` | yes | `tokio-util` and `metrics-exporter-prometheus`. |

Consumers who only want the queue types and SQL helpers can use:

```bash
cargo add rust-job-queue-api-worker-system --no-default-features
```

CI checks core-only, API-only, and worker-only builds.

## Tests

```bash
cargo test --all-features --locked
```

The suite currently has 46 tests across lib unit tests, one doctest, and seven integration binaries:

- `queue_ops` (19): SKIP LOCKED dequeue, retry math, cancel paths, recovery sweep, DB-layer idempotency.
- `api_jobs` (5): create/get/list, 404, 422, `/health`, OpenAPI rendering.
- `idempotency` (4): two-concurrent same-key, 10-concurrent same-key race, different keys, header precedence over body.
- `retry_and_cancel` (2): fail-once-then-succeed, always-fail-to-permanent.
- `worker_dequeue` (1): enqueue -> worker processes -> succeeded.
- `cancel_running` (1): cancellation observed at a safe point.
- `concurrency_two_workers` (1): 200 jobs across 8 workers, no duplicate processing observed by the test harness.
- lib unit tests (12): payload validation, retry jitter bounds, deterministic simulator hash.
- doctest (1): crate-level example in [src/lib.rs]src/lib.rs.

All integration tests use testcontainers; each test gets an isolated ephemeral database.

## Benchmarks

Two release-supported benches are reproducible against a fresh Postgres container in Docker.

```bash
# Single-worker queue-overhead microbench (Criterion).
cargo bench --bench dequeue --features worker

# Concurrent throughput across {1,2,4,8,16} workers and 3 Postgres configs.
./bench/run.sh
```

Measured results are in [bench/RESULTS.md](bench/RESULTS.md). The numbers are scoped to the recorded hardware and methodology. They are not presented as production throughput claims.

The exploratory head-to-head comparison benchmark is intentionally excluded from the v0.1 crates.io package until its methodology and teardown behavior are corrected.

## Example: embed the worker in your own application

```bash
DATABASE_URL=postgres://localhost/jobs \
  cargo run --example embed_worker --features worker
```

See [examples/embed_worker.rs](examples/embed_worker.rs) for a small example of plugging a custom `Executor` into `WorkerRuntime`.

## CI

[`ci.yml`](https://github.com/infinityabundance/Rust-Job-Queue-API-Worker-System/blob/main/.github/workflows/ci.yml) runs:

1. `cargo fmt --check`, clippy with warnings denied, feature-decoupling checks, full tests, release build, docs with warnings denied, `cargo package --list --locked`, and `cargo publish --dry-run --locked`.
2. `cargo +1.88.0 check --all-features --locked` for the declared MSRV.
3. RustSec audit.

## Trade-offs and intentional omissions

This artifact is deliberately small. The decisions and "what would change in production" notes live in [docs/tradeoffs.md](docs/tradeoffs.md). Start there if you are evaluating engineering judgment.

## Tech stack

Rust (MSRV 1.88), Tokio 1, Axum 0.8, sqlx 0.8, Postgres 16, tower-http 0.6, tracing 0.1, utoipa 5 + utoipa-swagger-ui 9, metrics + metrics-exporter-prometheus 0.18, thiserror 2, anyhow 1, testcontainers 0.27, Criterion 0.5.

## License

Apache-2.0