OpenAI Announces Realtime Voice API

(twitter.com)

15 points | by swyx 9 hours ago ago

2 comments

  • bottlepalm 4 hours ago ago

    Looks like even for the non-realtime API they're charging $200/M for output audio. Their current TTS API is $15/M (characters) for output audio, which equates to $60/M if each token is around 4 characters. Then add in the manual piping to the 4o LLM which is $15/M, around $75/M total.

    So from $75 to $200/M is a big premium for the convenience of one model and the quality of multi modal input/output. Will have to test and see if it's worth it.

    Also is there still no way to connect users directly to OpenAI? Like directly from a user's browser to OpenAI's servers, without the user having to supply their own API key? How does this work with realtime that needs websockets? Do I need an intermediate proxy server for all my users conversations? Seems like a waste of bandwidth, an unnecessary failure point, and a privacy problem. I hope I am wrong.

  • babyshake 9 hours ago ago

    Is this currently only in the playground for DevDay attendees? Not seeing it on my end.