Module llm

Expand description

HTTP-backed LLM completions for the [llm_local] and [llm_cloud] effects (#196).

Configuration is via environment variables — the simplest shape that doesn’t pull a config file format into the runtime. Power-user override is the existing lex_bytecode::vm::EffectHandler trait: callers that want something more elaborate (custom auth, batching, fallback providers, or non-HTTP transports) wrap DefaultHandler and intercept the agent.local_complete / agent.cloud_complete dispatch.

§`[llm_local]`

Defaults to Ollama at http://localhost:11434, model llama3.

OLLAMA_HOST — base URL of the Ollama server.
LEX_LLM_LOCAL_MODEL — model name passed to /api/generate.

Any service that speaks Ollama’s /api/generate JSON also works (llama.cpp’s compatible mode, vLLM with the right adapter, etc.).

§`[llm_cloud]`

Defaults to OpenAI’s /v1/chat/completions, model gpt-4o-mini. The shape is the OpenAI Chat Completions protocol, which most cloud LLM providers speak natively today — the env vars below let you point at any of them:

LEX_LLM_CLOUD_API_KEY — bearer token (preferred). Falls back to OPENAI_API_KEY if unset, so existing OpenAI-targeted setups keep working unchanged.
LEX_LLM_CLOUD_BASE_URL / OPENAI_BASE_URL — endpoint prefix (the /chat/completions is appended). Default is https://api.openai.com/v1.
LEX_LLM_CLOUD_MODEL — model name.

Provider matrix (concrete env-var combinations):

Provider	`LEX_LLM_CLOUD_BASE_URL`	`LEX_LLM_CLOUD_MODEL`
OpenAI	(default)	`gpt-4o-mini`, `gpt-4o`, `o1-mini`, …
Mistral	`https://api.mistral.ai/v1`	`mistral-large-latest`, `mistral-small-latest`, …
Together AI	`https://api.together.xyz/v1`	model id from their catalog
Groq	`https://api.groq.com/openai/v1`	`llama-3.1-70b-versatile`, …
DeepSeek	`https://api.deepseek.com/v1`	`deepseek-chat`, …
vLLM (self-hosted)	`http://your-vllm:8000/v1`	the model the vLLM is serving
Anthropic	use a translating proxy (e.g. `litellm`)	claude model id

Anthropic specifically doesn’t ship native chat-completions today; pair it with a proxy like litellm or a custom EffectHandler impl.

§Replay determinism

Not guaranteed today. Either provider may return different completions for the same prompt across runs. Wrap the handler if you need replay fidelity (pin a seed, snapshot the model hash, etc.) — soft-agent’s audit-replay pipeline (#187) is where that lives.

Functions§

cloud_complete: Run a completion against OpenAI’s chat-completions API. Synchronous; respects [llm_cloud] policy.
local_complete: Run a completion against Ollama. Synchronous; respects [llm_local] policy (the caller has already gated).

Module llm

Module llm Copy item path

§[llm_local]

§[llm_cloud]

§Replay determinism

Functions§

Module llm

§`[llm_local]`

§`[llm_cloud]`