1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
//! # oxillama-server
//!
//! OpenAI-compatible HTTP API server for OxiLLaMa.
//!
//! ## Endpoints
//!
//! | Method | Path | Description |
//! |--------|------|-------------|
//! | POST | `/v1/chat/completions` | Chat completion |
//! | POST | `/v1/completions` | Text completion |
//! | POST | `/v1/embeddings` | Text embeddings |
//! | GET | `/v1/models` | List loaded models |
//! | GET | `/health` | Health check |
//! | POST | `/v1/batches` | Create batch job (disk-spooled) |
//! | GET | `/v1/batches/:id` | Retrieve batch job |
//! | GET | `/v1/batches/:id/output` | Stream batch output JSONL |
//! | POST | `/v1/batches/:id/cancel` | Cancel batch job |
//! | GET | `/v1/batches` | List batch jobs |
//! | POST | `/v1/threads` | Create Assistants API thread |
//! | GET | `/v1/threads/:thread_id` | Retrieve thread |
//! | POST | `/v1/threads/:thread_id/messages` | Append message to thread |
//! | GET | `/v1/threads/:thread_id/messages` | List thread messages |
//! | POST | `/v1/threads/:thread_id/runs` | Create and enqueue a run |
//! | GET | `/v1/threads/:thread_id/runs/:run_id` | Get run status |
//! | POST | `/v1/threads/:thread_id/runs/:run_id/cancel` | Cancel a run |
//! | POST | `/admin/models/load` | Background-load model (admin) |
//! | POST | `/admin/models/unload` | Unload model (admin) |
//! | GET | `/admin/models` | List model pool (admin) |
//! | GET | `/admin/stats` | Server stats (admin) |
//! | GET | `/admin/health` | Extended health (admin) |
//! | POST | `/admin/loras` | Register a LoRA adapter (admin) |
//! | DELETE | `/admin/loras/{name}` | Unregister a LoRA adapter (admin) |
//! | GET | `/admin/loras` | List registered LoRA adapters (admin) |
pub
pub use build_app;
pub use ApiKeys;
pub use ServerConfig;
pub use ;
pub use Metrics;
pub use ;
pub use ;
pub use ResponseStore;
pub use ;
pub use ;
pub use AppState;
pub use ;
pub use spawn_inference_worker;