1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
//! Request handlers for the Infernum API.
//!
//! This module provides handler utilities and documentation for the API endpoints.
//! The actual handler implementations are in [`crate::server`] alongside the routing.
//!
//! # Endpoint Overview
//!
//! ## Health & Status
//!
//! | Endpoint | Method | Description |
//! |----------|--------|-------------|
//! | `/health` | GET | Server health check (always returns ok) |
//! | `/ready` | GET | Readiness check (returns model status) |
//! | `/metrics` | GET | Prometheus metrics |
//!
//! ## Inference Endpoints
//!
//! | Endpoint | Method | Description |
//! |----------|--------|-------------|
//! | `/v1/chat/completions` | POST | Chat completion (supports streaming) |
//! | `/v1/completions` | POST | Text completion |
//! | `/v1/embeddings` | POST | Generate embeddings |
//! | `/v1/models` | GET | List available models |
//!
//! ## Model Management
//!
//! | Endpoint | Method | Description |
//! |----------|--------|-------------|
//! | `/api/models/load` | POST | Load a model |
//! | `/api/models/unload` | POST | Unload the current model |
//!
//! # Authentication
//!
//! Endpoints (except `/health`, `/ready`, `/metrics`) require authentication
//! when auth is enabled. Pass the API key in the `Authorization` header:
//!
//! ```text
//! Authorization: Bearer sk-inf-your-api-key
//! ```
//!
//! # Error Responses
//!
//! All errors follow a structured JSON format:
//!
//! ```json
//! {
//! "error": {
//! "message": "Description of what went wrong",
//! "type": "invalid_request_error",
//! "code": "invalid_messages",
//! "param": "messages"
//! }
//! }
//! ```
//!
//! # Streaming
//!
//! Chat and completion endpoints support streaming via `"stream": true`.
//! Streaming responses use Server-Sent Events (SSE) format.
use Future;
use Pin;
use Response;
/// Type alias for async handler results.
pub type HandlerResult = ;
/// Handler configuration options.