pub enum RealtimeServerEvent {
Show 43 variants
Error(RealtimeServerEventError),
SessionCreated(RealtimeServerEventSessionCreated),
SessionUpdated(RealtimeServerEventSessionUpdated),
ConversationItemAdded(RealtimeServerEventConversationItemAdded),
ConversationItemDone(RealtimeServerEventConversationItemDone),
ConversationItemRetrieved(RealtimeServerEventConversationItemRetrieved),
ConversationItemInputAudioTranscriptionCompleted(RealtimeServerEventConversationItemInputAudioTranscriptionCompleted),
ConversationItemInputAudioTranscriptionDelta(RealtimeServerEventConversationItemInputAudioTranscriptionDelta),
ConversationItemInputAudioTranscriptionSegment(RealtimeServerEventConversationItemInputAudioTranscriptionSegment),
ConversationItemInputAudioTranscriptionFailed(RealtimeServerEventConversationItemInputAudioTranscriptionFailed),
ConversationItemTruncated(RealtimeServerEventConversationItemTruncated),
ConversationItemDeleted(RealtimeServerEventConversationItemDeleted),
InputAudioBufferCommitted(RealtimeServerEventInputAudioBufferCommitted),
InputAudioBufferCleared(RealtimeServerEventInputAudioBufferCleared),
InputAudioBufferSpeechStarted(RealtimeServerEventInputAudioBufferSpeechStarted),
InputAudioBufferSpeechStopped(RealtimeServerEventInputAudioBufferSpeechStopped),
InputAudioBufferTimeoutTriggered(RealtimeServerEventInputAudioBufferTimeoutTriggered),
OutputAudioBufferStarted(RealtimeServerEventOutputAudioBufferStarted),
OutputAudioBufferStopped(RealtimeServerEventOutputAudioBufferStopped),
OutputAudioBufferCleared(RealtimeServerEventOutputAudioBufferCleared),
ResponseCreated(RealtimeServerEventResponseCreated),
ResponseDone(RealtimeServerEventResponseDone),
ResponseOutputItemAdded(RealtimeServerEventResponseOutputItemAdded),
ResponseOutputItemDone(RealtimeServerEventResponseOutputItemDone),
ResponseContentPartAdded(RealtimeServerEventResponseContentPartAdded),
ResponseContentPartDone(RealtimeServerEventResponseContentPartDone),
ResponseOutputTextDelta(RealtimeServerEventResponseTextDelta),
ResponseOutputTextDone(RealtimeServerEventResponseTextDone),
ResponseOutputAudioTranscriptDelta(RealtimeServerEventResponseAudioTranscriptDelta),
ResponseOutputAudioTranscriptDone(RealtimeServerEventResponseAudioTranscriptDone),
ResponseOutputAudioDelta(RealtimeServerEventResponseAudioDelta),
ResponseOutputAudioDone(RealtimeServerEventResponseAudioDone),
ResponseFunctionCallArgumentsDelta(RealtimeServerEventResponseFunctionCallArgumentsDelta),
ResponseFunctionCallArgumentsDone(RealtimeServerEventResponseFunctionCallArgumentsDone),
ResponseMCPCallArgumentsDelta(RealtimeServerEventResponseMCPCallArgumentsDelta),
ResponseMCPCallArgumentsDone(RealtimeServerEventResponseMCPCallArgumentsDone),
ResponseMCPCallInProgress(RealtimeServerEventResponseMCPCallInProgress),
ResponseMCPCallCompleted(RealtimeServerEventResponseMCPCallCompleted),
ResponseMCPCallFailed(RealtimeServerEventResponseMCPCallFailed),
MCPListToolsInProgress(RealtimeServerEventMCPListToolsInProgress),
MCPListToolsCompleted(RealtimeServerEventMCPListToolsCompleted),
MCPListToolsFailed(RealtimeServerEventMCPListToolsFailed),
RateLimitsUpdated(RealtimeServerEventRateLimitsUpdated),
}realtime only.Expand description
These are events emitted from the OpenAI Realtime WebSocket server to the client.
Variants§
Error(RealtimeServerEventError)
Returned when an error occurs, which could be a client problem or a server problem. Most errors are recoverable and the session will stay open, we recommend to implementors to monitor and log error messages by default.
SessionCreated(RealtimeServerEventSessionCreated)
Returned when a Session is created. Emitted automatically when a new connection is established as the first server event. This event will contain the default Session configuration.
SessionUpdated(RealtimeServerEventSessionUpdated)
Returned when a session is updated with a session.update event, unless there is an error.
ConversationItemAdded(RealtimeServerEventConversationItemAdded)
Sent by the server when an Item is added to the default Conversation. This can happen in several cases:
- When the client sends a conversation.item.create event
- When the input audio buffer is committed. In this case the item will be a user message containing the audio from the buffer.
- When the model is generating a Response. In this case the
conversation.item.addedevent will be sent when the model starts generating a specific Item, and thus it will not yet have any content (andstatuswill bein_progress).
The event will include the full content of the Item (except when model is generating a Response) except for audio data,
which can be retrieved separately with a conversation.item.retrieve event if necessary.
ConversationItemDone(RealtimeServerEventConversationItemDone)
Returned when a conversation item is finalized.
The event will include the full content of the Item except for audio data, which can be retrieved
separately with a conversation.item.retrieve event if needed.
ConversationItemRetrieved(RealtimeServerEventConversationItemRetrieved)
Returned when a conversation item is retrieved with conversation.item.retrieve.
This is provided as a way to fetch the server’s representation of an item, for example to get access
to the post-processed audio data after noise cancellation and VAD.
It includes the full content of the Item, including audio data.
ConversationItemInputAudioTranscriptionCompleted(RealtimeServerEventConversationItemInputAudioTranscriptionCompleted)
This event is the output of audio transcription for user audio written to the user audio buffer. Transcription begins when the input audio buffer is committed by the client or server (when VAD is enabled). Transcription runs asynchronously with Response creation, so this event may come before or after the Response events.
Realtime API models accept audio natively, and thus input transcription is a separate process run on a separate ASR (Automatic Speech Recognition) model. The transcript may diverge somewhat from the model’s interpretation, and should be treated as a rough guide.
ConversationItemInputAudioTranscriptionDelta(RealtimeServerEventConversationItemInputAudioTranscriptionDelta)
Returned when the text value of an input audio transcription content part is updated with incremental transcription results.
ConversationItemInputAudioTranscriptionSegment(RealtimeServerEventConversationItemInputAudioTranscriptionSegment)
Returned when an input audio transcription segment is identified for an item.
ConversationItemInputAudioTranscriptionFailed(RealtimeServerEventConversationItemInputAudioTranscriptionFailed)
Returned when input audio transcription is configured, and a transcription request for a user message failed.
These events are separate from other error events so that the client can identify the related Item.
ConversationItemTruncated(RealtimeServerEventConversationItemTruncated)
Returned when an earlier assistant audio message item is truncated by the client with a conversation.item.truncate event.
This event is used to synchronize the server’s understanding of the audio with the client’s playback.
This action will truncate the audio and remove the server-side text transcript to ensure there is no text in the context that hasn’t been heard by the user.
ConversationItemDeleted(RealtimeServerEventConversationItemDeleted)
Returned when an item in the conversation is deleted by the client with a conversation.item.delete event.
This event is used to synchronize the server’s understanding of the conversation history with the client’s view.
InputAudioBufferCommitted(RealtimeServerEventInputAudioBufferCommitted)
Returned when an input audio buffer is committed, either by the client or automatically in server VAD mode.
The item_id property is the ID of the user message item that will be created,
thus a conversation.item.created event will also be sent to the client.
InputAudioBufferCleared(RealtimeServerEventInputAudioBufferCleared)
Returned when the input audio buffer is cleared by the client with a input_audio_buffer.clear event.
InputAudioBufferSpeechStarted(RealtimeServerEventInputAudioBufferSpeechStarted)
Sent by the server when in server_vad mode to indicate that speech has been detected in the audio buffer.
This can happen any time audio is added to the buffer (unless speech is already detected).
The client may want to use this event to interrupt audio playback or provide visual feedback to the user.
The client should expect to receive a input_audio_buffer.speech_stopped event when speech stops.
The item_id property is the ID of the user message item that will be created when speech stops and will
also be included in the input_audio_buffer.speech_stopped event (unless the client manually commits the
audio buffer during VAD activation).
InputAudioBufferSpeechStopped(RealtimeServerEventInputAudioBufferSpeechStopped)
Returned in server_vad mode when the server detects the end of speech in the audio buffer.
The server will also send a conversation.item.created event with the user message item that is created from the audio buffer.
InputAudioBufferTimeoutTriggered(RealtimeServerEventInputAudioBufferTimeoutTriggered)
Returned when the Server VAD timeout is triggered for the input audio buffer. This is
configured with idle_timeout_ms in the turn_detection settings of the session, and
it indicates that there hasn’t been any speech detected for the configured duration.
The audio_start_ms and audio_end_ms fields indicate the segment of audio after the
last model response up to the triggering time, as an offset from the beginning of audio
written to the input audio buffer. This means it demarcates the segment of audio that
was silent and the difference between the start and end values will roughly match the configured timeout.
The empty audio will be committed to the conversation as an input_audio item (there
will be a input_audio_buffer.committed event) and a model response will be generated.
There may be speech that didn’t trigger VAD but is still detected by the model, so the model may respond
with something relevant to the conversation or a prompt to continue speaking.
OutputAudioBufferStarted(RealtimeServerEventOutputAudioBufferStarted)
WebRTC Only: Emitted when the server begins streaming audio to the client. This
event is emitted after an audio content part has been added (response.content_part.added) to the response.
Learn more.
OutputAudioBufferStopped(RealtimeServerEventOutputAudioBufferStopped)
WebRTC Only: Emitted when the output audio buffer has been completely drained on
the server, and no more audio is forthcoming. This event is emitted after the full response data has been sent
to the client (response.done). Learn more.
OutputAudioBufferCleared(RealtimeServerEventOutputAudioBufferCleared)
WebRTC Only: Emitted when the output audio buffer is cleared. This happens either in
VAD mode when the user has interrupted (input_audio_buffer.speech_started), or when the client has
emitted the output_audio_buffer.clear event to manually cut off the current audio response.
Learn more.
ResponseCreated(RealtimeServerEventResponseCreated)
Returned when a new Response is created. The first event of response creation,
where the response is in an initial state of in_progress.
ResponseDone(RealtimeServerEventResponseDone)
Returned when a Response is done streaming. Always emitted, no matter the final state.
The Response object included in the response.done event will include all output Items in the Response
but will omit the raw audio data.
Clients should check the status field of the Response to determine if it was successful
(completed) or if there was another outcome: cancelled, failed, or incomplete.
A response will contain all output items that were generated during the response, excluding any audio content.
ResponseOutputItemAdded(RealtimeServerEventResponseOutputItemAdded)
Returned when a new Item is created during Response generation.
ResponseOutputItemDone(RealtimeServerEventResponseOutputItemDone)
Returned when an Item is done streaming. Also emitted when a Response is interrupted, incomplete, or cancelled.
ResponseContentPartAdded(RealtimeServerEventResponseContentPartAdded)
Returned when a new content part is added to an assistant message item during response generation.
ResponseContentPartDone(RealtimeServerEventResponseContentPartDone)
Returned when a content part is done streaming in an assistant message item. Also emitted when a Response is interrupted, incomplete, or cancelled.
ResponseOutputTextDelta(RealtimeServerEventResponseTextDelta)
Returned when the text value of an “output_text” content part is updated.
ResponseOutputTextDone(RealtimeServerEventResponseTextDone)
Returned when the text value of an “output_text” content part is done streaming. Also emitted when a Response is interrupted, incomplete, or cancelled.
ResponseOutputAudioTranscriptDelta(RealtimeServerEventResponseAudioTranscriptDelta)
Returned when the model-generated transcription of audio output is updated.
ResponseOutputAudioTranscriptDone(RealtimeServerEventResponseAudioTranscriptDone)
Returned when the model-generated transcription of audio output is done streaming. Also emitted when a Response is interrupted, incomplete, or cancelled.
ResponseOutputAudioDelta(RealtimeServerEventResponseAudioDelta)
Returned when the model-generated audio is updated.
ResponseOutputAudioDone(RealtimeServerEventResponseAudioDone)
Returned when the model-generated audio is done. Also emitted when a Response is interrupted, incomplete, or cancelled.
ResponseFunctionCallArgumentsDelta(RealtimeServerEventResponseFunctionCallArgumentsDelta)
Returned when the model-generated function call arguments are updated.
ResponseFunctionCallArgumentsDone(RealtimeServerEventResponseFunctionCallArgumentsDone)
Returned when the model-generated function call arguments are done streaming. Also emitted when a Response is interrupted, incomplete, or cancelled.
ResponseMCPCallArgumentsDelta(RealtimeServerEventResponseMCPCallArgumentsDelta)
Returned when MCP tool call arguments are updated.
ResponseMCPCallArgumentsDone(RealtimeServerEventResponseMCPCallArgumentsDone)
Returned when MCP tool call arguments are finalized during response generation.
ResponseMCPCallInProgress(RealtimeServerEventResponseMCPCallInProgress)
Returned when an MCP tool call is in progress.
ResponseMCPCallCompleted(RealtimeServerEventResponseMCPCallCompleted)
Returned when an MCP tool call has completed successfully.
ResponseMCPCallFailed(RealtimeServerEventResponseMCPCallFailed)
Returned when an MCP tool call has failed.
MCPListToolsInProgress(RealtimeServerEventMCPListToolsInProgress)
Returned when listing MCP tools is in progress for an item.
MCPListToolsCompleted(RealtimeServerEventMCPListToolsCompleted)
Returned when listing MCP tools has completed for an item.
MCPListToolsFailed(RealtimeServerEventMCPListToolsFailed)
Returned when listing MCP tools has failed for an item.
RateLimitsUpdated(RealtimeServerEventRateLimitsUpdated)
Emitted at the beginning of a Response to indicate the updated rate limits. When a Response is created some tokens will be “reserved” for the output tokens, the rate limits shown here reflect that reservation, which is then adjusted accordingly once the Response is completed.
Trait Implementations§
Source§impl Clone for RealtimeServerEvent
impl Clone for RealtimeServerEvent
Source§fn clone(&self) -> RealtimeServerEvent
fn clone(&self) -> RealtimeServerEvent
1.0.0 · Source§fn clone_from(&mut self, source: &Self)
fn clone_from(&mut self, source: &Self)
source. Read more