llama-crab-server-0.1.6 is not a library.
llama-crab-server
OpenAI-compatible HTTP server for local llama-crab inference.
Built on top of axum and exposes a worker thread
that owns the model and context.
Installation
For development against a workspace checkout:
Routes
| Route | Description |
|---|---|
GET /health |
Liveness probe. |
GET /v1/models |
List the loaded model. |
POST /v1/completions |
OpenAI legacy text completions. |
POST /v1/chat/completions |
OpenAI chat completions with streaming. |
POST /v1/embeddings |
Embeddings (float or base64). |
POST /v1/rerank, POST /v1/reranking |
Rerank. |
POST /extras/tokenize, /extras/tokenize/count, /extras/detokenize |
Tokenizer helpers. |
Multimodal chat is available when the binary is built with --features mtmd
and started with --mmproj <projector.gguf>.
Hugging Face support
To enable loading models directly from Hugging Face (e.g. --model TheBloke/...),
install the server with the hf-hub feature:
When the feature is enabled, the server accepts Hugging Face repository ids
via --model and disambiguates multi-.gguf repos via --hf-filename:
For the full request schema, sampling fields and structured-output options, see the server guide.
Resources
License
Licensed under the MIT License.