A concrete [LlmProvider] backed by the GCP AI platform REST API,
authenticated via Workload Identity Federation / Application Default
Credentials (handled by gcp_auth: in-cluster metadata, WIF, or a local
gcloud login, transparently).
Uses the SSE-streaming streamGenerateContent endpoint
(?alt=sse) and adapts each partial response into the streaming
[Chunk] vocabulary the trait expects (text → [Chunk::TextDelta] per
token batch, function calls → tool-call chunks, usage + finish reason
→ [Chunk::Usage]/[Chunk::Stop]). Chunks are yielded as bytes arrive
— the harness pushes them down to the client without buffering the
whole response, so user-visible latency starts at first-token time.
The trait boundary keeps it a
single-file change.