Quickstart
Realtime agents in the Python SDK are server-side, low-latency agents built on the OpenAI Realtime API over WebSocket transport.
Beta feature
Realtime agents are in beta. Expect some breaking changes as we improve the implementation.
Python SDK boundary
The Python SDK does not provide a browser WebRTC transport. This page only covers Python-managed realtime sessions over server-side WebSockets. Use this SDK for server-side orchestration, tools, approvals, and telephony integrations. See also Realtime transport.
Prerequisites
- Python 3.10 or higher
- OpenAI API key
- Basic familiarity with the OpenAI Agents SDK
Installation
If you haven't already, install the OpenAI Agents SDK:
Create a server-side realtime session
1. Import the realtime components
2. Define the starting agent
agent = RealtimeAgent(
name="Assistant",
instructions="You are a helpful voice assistant. Keep responses short and conversational.",
)
3. Configure the runner
Prefer the nested audio.input / audio.output session settings shape for new code.
runner = RealtimeRunner(
starting_agent=agent,
config={
"model_settings": {
"model_name": "gpt-realtime",
"audio": {
"input": {
"format": "pcm16",
"transcription": {"model": "gpt-4o-mini-transcribe"},
"turn_detection": {
"type": "semantic_vad",
"interrupt_response": True,
},
},
"output": {
"format": "pcm16",
"voice": "ash",
},
},
}
},
)
4. Start the session and send input
runner.run() returns a RealtimeSession. The connection is opened when you enter the session context.
async def main() -> None:
session = await runner.run()
async with session:
await session.send_message("Say hello in one short sentence.")
async for event in session:
if event.type == "audio":
# Forward or play event.audio.data.
pass
elif event.type == "history_added":
print(event.item)
elif event.type == "agent_end":
# One assistant turn finished.
break
elif event.type == "error":
print(f"Error: {event.error}")
if __name__ == "__main__":
asyncio.run(main())
session.send_message() accepts either a plain string or a structured realtime message. For raw audio chunks, use session.send_audio().
What this quickstart does not include
- Microphone capture and speaker playback code. See the realtime examples in
examples/realtime. - SIP / telephony attach flows. See Realtime transport and the SIP section.
Key settings
Once the basic session works, the settings most people reach for next are:
model_nameaudio.input.format,audio.output.formataudio.input.transcriptionaudio.input.noise_reductionaudio.input.turn_detectionfor automatic turn detectionaudio.output.voicetool_choice,prompt,tracingasync_tool_calls,guardrails_settings.debounce_text_length,tool_error_formatter
The older flat aliases such as input_audio_format, output_audio_format, input_audio_transcription, and turn_detection still work, but nested audio settings are preferred for new code.
For manual turn control, use a raw session.update / input_audio_buffer.commit / response.create flow as described in the Realtime agents guide.
For the full schema, see RealtimeRunConfig and RealtimeSessionModelSettings.
Connection options
Set your API key in the environment:
Or pass it directly when starting the session:
model_config also supports:
url: Custom WebSocket endpointheaders: Custom request headerscall_id: Attach to an existing realtime call. In this repo, the documented attach flow is SIP.playback_tracker: Report how much audio the user has actually heard
If you pass headers explicitly, the SDK will not inject an Authorization header for you.
When connecting to Azure OpenAI, pass a GA Realtime endpoint URL in model_config["url"] and explicit headers. Avoid the legacy beta path (/openai/realtime?api-version=...) with realtime agents. See the Realtime agents guide for details.
Next steps
- Read Realtime transport to choose between server-side WebSocket and SIP.
- Read the Realtime agents guide for lifecycle, structured input, approvals, handoffs, guardrails, and low-level control.
- Browse the examples in
examples/realtime.