Please check the build logs for more information.
See Builds for ideas on how to fix a failed build, or Metadata for how to configure docs.rs builds.
If you believe this is docs.rs' fault, open an issue.
Vllora LLM crate (vllora_llm)
This crate powers the Vllora AI Gateway’s LLM layer. It provides:
- Unified chat-completions client over multiple providers (OpenAI-compatible, Anthropic, Gemini, Bedrock, …)
- Gateway-native types (
ChatCompletionRequest,ChatCompletionMessage, routing & tools support) - Streaming responses and telemetry hooks via a common
ModelInstancetrait - Tracing integration: out-of-the-box
tracingsupport, with a console example inllm/examples/tracing(spans/events to stdout) and an OTLP example inllm/examples/tracing_otlp(send spans to external collectors such as New Relic) - Supported parameters: See Supported parameters for a detailed table of which parameters are honored by each provider
Use it when you want to talk to the gateway’s LLM engine from Rust code, without worrying about provider-specific SDKs.
Installation
Run cargo add vllora_llm or add to your Cargo.toml:
[]
= "0.1"
Quick start
Here's a minimal example to get started:
use VlloraLLMClient;
use ;
use LLMResult;
async
Note: By default, VlloraLLMClient::new() fetches API keys from environment variables following the pattern VLLORA_{PROVIDER_NAME}_API_KEY. For example, for OpenAI, it will look for VLLORA_OPENAI_API_KEY.
Quick start with async-openai-compatible types
If you already build OpenAI-compatible requests (e.g. via async-openai-compat), you can send both non‑streaming and streaming completions through VlloraLLMClient.
use ;
use StreamExt;
use VlloraLLMClient;
use LLMResult;
use ;
async
Basic usage: completions client (gateway-native)
The main entrypoint is VlloraLLMClient, which gives you a CompletionsClient for chat completions using the gateway-native request/response types.
use Arc;
use ;
use ;
use LLMResult;
async
Key pieces:
VlloraLLMClient: wraps aModelInstanceand exposes.completions().CompletionsClient::create: sends a one-shot completion request and returns aChatCompletionMessageWithFinishReason.- Gateway types (
ChatCompletionRequest,ChatCompletionMessage) abstract over provider-specific formats.
Streaming completions
CompletionsClient::create_stream returns a ResultStream that yields streaming chunks:
use Arc;
use ;
use ;
use LLMResult;
async
The stream API mirrors OpenAI-style streaming but uses gateway-native ChatCompletionChunk types.
Supported parameters
The table below lists which ChatCompletionRequest (and provider-specific) parameters are honored by each provider when using VlloraLLMClient:
| Parameter | OpenAI / Proxy | Anthropic | Gemini | Bedrock | Notes |
|---|---|---|---|---|---|
model |
yes | yes | yes | yes | Taken from ChatCompletionRequest.model or engine config. |
max_tokens |
yes | yes | yes | yes | Mapped to provider-specific max_tokens / max_output_tokens. |
temperature |
yes | yes | yes | yes | Sampling temperature. |
top_p |
yes | yes | yes | yes | Nucleus sampling. |
n |
no | no | yes | no | For Gemini, mapped to candidate_count; other providers always use n = 1. |
stop / stop_sequences |
yes | yes | yes | yes | Converted to each provider’s stop / stop-sequences field. |
presence_penalty |
yes | no | yes | no | OpenAI / Gemini only. |
frequency_penalty |
yes | no | yes | no | OpenAI / Gemini only. |
logit_bias |
yes | no | no | no | OpenAI-only token bias map. |
user |
yes | no | no | no | OpenAI “end-user id” field. |
seed |
yes | no | yes | no | Deterministic sampling where supported. |
response_format (JSON schema, etc.) |
yes | no | yes | no | Gemini additionally normalizes JSON schema for its API. |
prompt_cache_key |
yes | no | no | no | OpenAI-only prompt caching hint. |
provider_specific.top_k |
no | yes | no | no | Anthropic-only: maps to Claude top_k. |
provider_specific.thinking |
no | yes | no | no | Anthropic “thinking” options (e.g. budget tokens). |
Bedrock additional_parameters map |
no | no | no | yes | Free-form JSON, passed through to Bedrock model params. |
Additionally, for Anthropic, the first system message in the conversation is mapped into a SystemPrompt (either as a single text string or as multiple TextContentBlocks), and any cache_control options on those blocks are translated into Anthropic’s ephemeral cache-control settings.
All other fields on ChatCompletionRequest (such as stream, tools, tool_choice, functions, function_call) are handled at the gateway layer and/or per-provider tool integration, but are not mapped 1:1 into provider primitive parameters.
Provider-specific examples
There are runnable examples under llm/examples/ that mirror the patterns above:
openai: Direct OpenAI chat completions usingVlloraLLMClient(non-streaming + streaming).anthropic: Anthropic (Claude) chat completions via the unified client.gemini: Gemini chat completions via the unified client.bedrock: AWS Bedrock chat completions (Nova etc.) via the unified client.proxy: UsingInferenceModelProvider::Proxy("proxy_name")to call a OpenAI completions-compatible endpoint.tracing: Same OpenAI-style flow asopenai, but withtracing_subscriber::fmt()configured to emit spans and events to the console (stdout).tracing_otlp: Shows how to wirevllora_telemetry::events::layerto an OTLP HTTP exporter (e.g. New Relic / any OTLP collector) and emit spans fromVlloraLLMClientcalls to a remote telemetry backend.
Each example is a standalone Cargo binary; you can cd into a directory and run:
after setting the provider-specific environment variables noted in the example’s main.rs.
Notes
- Real usage: In the full LangDB / Vllora gateway, concrete
ModelInstanceimplementations are created by the core executor based on yourmodels.yamland routing rules; the examples above useDummyModelInstanceonly to illustrate the public API of theCompletionsClient. - Error handling: All client methods return
LLMResult<T>, which wraps richLLMErrorvariants (network, mapping, provider errors, etc.). - More features: The same types in
vllora_llm::types::gatewayare used for tools, MCP, routing, embeddings, and image generation; see the main repository docs athttps://vllora.dev/docsfor higher-level gateway features.
Roadmap and issues
- GitHub issues / roadmap: See open LLM crate issues for planned and outstanding work.
- Planned enhancements:
- Integrate responses API
- Support builtin MCP tool calls
- Gemini prompt caching supported
- Full thinking messages support