Skip to main content

Module llm

Module llm 

Source
Expand description

HTTP-backed LLM completions for the [llm_local] and [llm_cloud] effects (#196).

Configuration is via environment variables — the simplest shape that doesn’t pull a config file format into the runtime. Power-user override is the existing lex_bytecode::vm::EffectHandler trait: callers that want something more elaborate (custom auth, batching, fallback providers, or non-HTTP transports) wrap DefaultHandler and intercept the agent.local_complete / agent.cloud_complete dispatch.

§[llm_local]

Defaults to Ollama at http://localhost:11434, model llama3.

  • OLLAMA_HOST — base URL of the Ollama server.
  • LEX_LLM_LOCAL_MODEL — model name passed to /api/generate.

Any service that speaks Ollama’s /api/generate JSON also works (llama.cpp’s compatible mode, vLLM with the right adapter, etc.).

§[llm_cloud]

Defaults to OpenAI’s /v1/chat/completions, model gpt-4o-mini. The shape is the OpenAI Chat Completions protocol, which most cloud LLM providers speak natively today — the env vars below let you point at any of them:

  • LEX_LLM_CLOUD_API_KEY — bearer token (preferred). Falls back to OPENAI_API_KEY if unset, so existing OpenAI-targeted setups keep working unchanged.
  • LEX_LLM_CLOUD_BASE_URL / OPENAI_BASE_URL — endpoint prefix (the /chat/completions is appended). Default is https://api.openai.com/v1.
  • LEX_LLM_CLOUD_MODEL — model name.

Provider matrix (concrete env-var combinations):

ProviderLEX_LLM_CLOUD_BASE_URLLEX_LLM_CLOUD_MODEL
OpenAI(default)gpt-4o-mini, gpt-4o, o1-mini, …
Mistralhttps://api.mistral.ai/v1mistral-large-latest, mistral-small-latest, …
Together AIhttps://api.together.xyz/v1model id from their catalog
Groqhttps://api.groq.com/openai/v1llama-3.1-70b-versatile, …
DeepSeekhttps://api.deepseek.com/v1deepseek-chat, …
vLLM (self-hosted)http://your-vllm:8000/v1the model the vLLM is serving
Anthropicuse a translating proxy (e.g. litellm)claude model id

Anthropic specifically doesn’t ship native chat-completions today; pair it with a proxy like litellm or a custom EffectHandler impl.

§Replay determinism

Not guaranteed today. Either provider may return different completions for the same prompt across runs. Wrap the handler if you need replay fidelity (pin a seed, snapshot the model hash, etc.) — soft-agent’s audit-replay pipeline (#187) is where that lives.

Functions§

cloud_complete
Run a completion against OpenAI’s chat-completions API. Synchronous; respects [llm_cloud] policy.
local_complete
Run a completion against Ollama. Synchronous; respects [llm_local] policy (the caller has already gated).