Expand description
Streaming chat endpoint — the core 1-hop architecture handler.
Flow: Client POST → SharedState (session + cache lookup) → LLM Worker (HTTP to llama-server) → SSE stream back All state access is in-process via Arc/shared memory. The only network hop is to localhost llama-server.
Structs§
- Chat
Attachment - File attachment reference sent with the chat request.
- Stream
Chat Request - Request body matching what the frontend sends
Functions§
- generate_
stream - POST /generate/stream — Main streaming chat endpoint