LLM worker thread implementation
Handles LLM inference by proxying requests to the local llama-server process. This is the 1-hop architecture: shared memory state → HTTP to localhost llama-server.