Peter Bakkum (@pbbakkum) / X

Peter Bakkum

1,227 posts

Peter Bakkum

@pbbakkum

I work on Multimodal API @openai, for fun: butterfi.sh

San Francisco

Joined November 2011

Pinned
Peter Bakkum
@pbbakkum
Aug 28, 2025
Replying to @pbbakkum
18K
Peter Bakkum
@pbbakkum
Oct 23, 2025
A small audio model launch -- gpt-4o-transcribe-diarize This is a diarization-focused ASR model, it's big and slow so we recommend running it offline, but it excels at differentiating speakers, and you can provide voice samples for known speakers up front.
218K
Peter Bakkum
@pbbakkum
May 13, 2024
I joined OpenAI at the beginning of the year -- partly because I was excited about the possibility of better voice interaction with computers. So it was *especially* amazing to work with the team here on the gpt-4o model launch. It's hard to grok until you try it how big of a
00:00
144K
Peter Bakkum
@pbbakkum
Aug 28, 2025
My favorite demo of the new gpt-realtime model from @matthieulc -- Shoggoth Mini using Realtime API with image input
00:00
68K
Peter Bakkum
@pbbakkum
Oct 5, 2025
I’ll be at OpenAI DevDay tomorrow, come find me at the multimodal booth if you want to talk audio models and Realtime API
32K
Peter Bakkum
@pbbakkum
Jun 21, 2025
I can’t overemphasize how good the new realtime speech2speech model is at function calling. It is fast and accurate with native audio input. It exceeded expectations from myself and posttraining researchers. This one — gpt-4o-realtime-preview-2025-06-03
66K
Peter Bakkum
@pbbakkum
Feb 10, 2023
Pretty frustrated with all these Google AI naysayers ignoring that the AI Product org scored 0.833 on their Key Results over the last two quarters
91K
Peter Bakkum
@pbbakkum
Jan 9, 2025
Heads up -- we're shifting the OpenAI model for the Realtime API gpt-4o-realtime-preview to point to gpt-4o-realtime-preview-2024-12-17. This model has some valuable improvements, if you use the dateless model things should get magically better.
25K
Peter Bakkum
@pbbakkum
Aug 28, 2025
We're adding MCP capability to Realtime API, I'm very excited about how well MCP tools work over voice. Here's a demo using a Notion MCP --
00:00
27K
Peter Bakkum
@pbbakkum
Jan 24, 2025
A new feature for the Realtime API -- you can now set "language" and "prompt" for input audio transcription. This was requested by lots of users, it should make a big difference if you rely on transcription accuracy and know the language or expected keywords.
43K
Peter Bakkum
@pbbakkum
Oct 1, 2024
tomorrow 😵
49K
Peter Bakkum
@pbbakkum
Jun 29, 2025
A terrible and embarrassing thing has happened, which is that I’ve become MCP-pilled and now know it to be the future of everything
20K
Peter Bakkum
@pbbakkum
Mar 20, 2025
New feature launching today on the Realtime API: 🟤Semantic VAD🟤. This is a custom turn detection model that uses the *content* of speech to tell if the user is done. This is a huge improvement on cases where the user pauses and the model incorrectly interrupts.
23K
Peter Bakkum
@pbbakkum
Jul 19, 2025
My team is very very good
12K