A proxy to actual LLM server, and only send the last pending request.
If the LLM generation is slower than requests, then processing the oldest
request does not make sense, as the user is more interested in the later
requests. But current server (AFAIK) does not handle this properly…
Represents a generic daemon capable of performing background tasks, including spawning itself,
maintaining a heartbeat, and generating responses based on prompts.