oxillama-server
OpenAI-compatible HTTP API server for OxiLLaMa — drop-in replacement for llama-server.
Part of the OxiLLaMa workspace — a Pure Rust LLM inference engine.
What It Provides
POST /v1/chat/completions— OpenAI chat completions (streaming via SSE + non-streaming)POST /v1/completions— Legacy text completionsPOST /v1/embeddings— Text embedding extractionGET /v1/models— List available loaded modelsGET /health— Liveness probe- Server-Sent Events (SSE) streaming with
deltachunked responses - JSON request/response fully compatible with OpenAI SDK clients
Usage
Start the server from the CLI:
# Via the oxillama binary
# Or with extra options
Query it with curl:
|
Or use the official OpenAI Python SDK:
=
=
License
Apache-2.0 — COOLJAPAN OU (Team Kitasan)