agentkit-provider-vllm

vLLM model adapter for the agentkit agent loop.

This crate provides VllmAdapter and VllmConfig for connecting the agent loop to a vLLM server via its OpenAI-compatible chat completions endpoint. It handles request translation and response normalization for vLLM-backed sessions. Streaming is enabled by default; use .with_streaming(false) to force the buffered response path.

An API key is optional — vLLM servers can run with or without authentication (controlled by the --api-key flag when starting the server).

Configuration

Create a config with VllmConfig::new(model) and chain .with_*() builders for optional parameters. Alternatively, VllmConfig::from_env() reads from environment variables:

Variable	Required	Default
`VLLM_MODEL`	yes	--
`VLLM_BASE_URL`	no	`http://localhost:8000/v1/chat/completions`
`VLLM_API_KEY`	no	--

Examples

Minimal chat agent

use agentkit_loop::{Agent, SessionConfig};
use agentkit_provider_vllm::{VllmAdapter, VllmConfig};

# #[tokio::main]
# async fn main() -> Result<(), Box<dyn std::error::Error>> {
let config = VllmConfig::new("meta-llama/Llama-3.1-8B-Instruct");
let adapter = VllmAdapter::new(config)?;

let agent = Agent::builder()
    .model(adapter)
    .build()?;

let mut driver = agent
    .start(SessionConfig::new("demo"))
    .await?;

let step = driver.next().await?;
println!("{step:?}");
# Ok(())
# }

Authenticated vLLM server

use agentkit_provider_vllm::{VllmAdapter, VllmConfig};

# fn main() -> Result<(), Box<dyn std::error::Error>> {
let config = VllmConfig::new("meta-llama/Llama-3.1-8B-Instruct")
    .with_api_key("my-secret-key")
    .with_temperature(0.0)
    .with_max_completion_tokens(4096);

let adapter = VllmAdapter::new(config)?;
# Ok(())
# }

Remote server with environment-based configuration