Expand description
HTTP-backed LLM completions for the [llm_local] and
[llm_cloud] effects (#196).
Configuration is via environment variables — the simplest
shape that doesn’t pull a config file format into the
runtime. Power-user override is the existing
lex_bytecode::vm::EffectHandler trait: callers that want
something more elaborate (custom auth, batching, fallback
providers, or non-HTTP transports) wrap DefaultHandler
and intercept the agent.local_complete /
agent.cloud_complete dispatch.
§[llm_local]
Defaults to Ollama at http://localhost:11434, model
llama3.
OLLAMA_HOST— base URL of the Ollama server.LEX_LLM_LOCAL_MODEL— model name passed to/api/generate.
Any service that speaks Ollama’s /api/generate JSON also
works (llama.cpp’s compatible mode, vLLM with the right
adapter, etc.).
§[llm_cloud]
Defaults to OpenAI’s /v1/chat/completions, model
gpt-4o-mini. The shape is the OpenAI Chat Completions
protocol, which most cloud LLM providers speak natively
today — the env vars below let you point at any of them:
LEX_LLM_CLOUD_API_KEY— bearer token (preferred). Falls back toOPENAI_API_KEYif unset, so existing OpenAI-targeted setups keep working unchanged.LEX_LLM_CLOUD_BASE_URL/OPENAI_BASE_URL— endpoint prefix (the/chat/completionsis appended). Default ishttps://api.openai.com/v1.LEX_LLM_CLOUD_MODEL— model name.
Provider matrix (concrete env-var combinations):
| Provider | LEX_LLM_CLOUD_BASE_URL | LEX_LLM_CLOUD_MODEL |
|---|---|---|
| OpenAI | (default) | gpt-4o-mini, gpt-4o, o1-mini, … |
| Mistral | https://api.mistral.ai/v1 | mistral-large-latest, mistral-small-latest, … |
| Together AI | https://api.together.xyz/v1 | model id from their catalog |
| Groq | https://api.groq.com/openai/v1 | llama-3.1-70b-versatile, … |
| DeepSeek | https://api.deepseek.com/v1 | deepseek-chat, … |
| vLLM (self-hosted) | http://your-vllm:8000/v1 | the model the vLLM is serving |
| Anthropic | use a translating proxy (e.g. litellm) | claude model id |
Anthropic specifically doesn’t ship native chat-completions
today; pair it with a proxy like litellm or a custom
EffectHandler impl.
§Replay determinism
Not guaranteed today. Either provider may return different completions for the same prompt across runs. Wrap the handler if you need replay fidelity (pin a seed, snapshot the model hash, etc.) — soft-agent’s audit-replay pipeline (#187) is where that lives.
Functions§
- cloud_
complete - Run a completion against OpenAI’s chat-completions API.
Synchronous; respects
[llm_cloud]policy. - local_
complete - Run a completion against Ollama. Synchronous; respects
[llm_local]policy (the caller has already gated).