Skip to main content

Module llm_worker

Module llm_worker 

Source
Expand description

LLM worker thread implementation

Handles LLM inference by proxying requests to the local llama-server process. This is the 1-hop architecture: shared memory state → HTTP to localhost llama-server.

Structs§

LLMWorker