Show HN: Open source framework OpenAI uses for Advanced Voice

(github.com)

95 points | by russ 10 hours ago ago

17 comments

  • pj_mukh 5 hours ago ago

    Super cool! Didn't realize OpenAI is just using LiveKit.

    Does the pricing breakdown to be the same as having a OpenAI Advanced Voice socket open the whole time? It's like $9/hr!

    It would be theoretically cheaper to use this without keeping the advanced voice socket open the whole time and just use the GPT4o streaming service [1] for whenever inference is needed (pay per token) and use livekits other components to do the rest (TTS, VAD etc.).

    What's the trade off here?

    [1]: https://platform.openai.com/docs/api-reference/streaming

    • davidz 4 hours ago ago

      Currently it does: all audio is sent to the model.

      However, we are working on turn detection within the framework, so you won't have to send silence to the model when the user isn't talking. It's a fairly straight forward path to cutting down the cost by ~50%.

  • solarkraft 2 hours ago ago

    That’s some crazy marketing for a „our library happened to support this relatively simple use case“ situation. Impressive!

    By the way: The cerebras voice demo also uses LiveKit for this: https://cerebras.vercel.app/

    • russ 26 minutes ago ago

      There’s a ton of complexity under the “relatively simple use case” when you get to a global, 200M+ user scale.

  • FanaHOVA 6 hours ago ago

    Olivier, Michelle, and Romain gave you guys a shoutout like 3 times in our DevDay recap podcast if you need more testimonial quotes :) https://www.latent.space/p/devday-2024

    • russ 4 hours ago ago

      I had no idea! <3 Thank you for sharing this, made my weekend.

    • shayps 3 hours ago ago

      You guys are honestly the best

  • mycall 6 hours ago ago

    I wonder when Azure OpenAI will get this.

    • davidz 4 hours ago ago

      I'm working on a PR now :)

  • gastonmorixe 6 hours ago ago

    Nice they have many partners on this. I see Azure as well.

    There is a common consensus that the new Realtime API is not actually using the same Advanced Voice model / engine - or however it works - since at least the TTS part doesn’t seem to be as capable as the one shipped with the official OpenAI app.

    Any idea on this?

    Source: https://github.com/openai/openai-realtime-api-beta/issues/2

    • russ 4 hours ago ago

      It's using the same model/engine. I don't have knowledge of the internals, but a different subsystem/set of dedicated resources though for API traffic versus first-party apps.

      One thing to note is there is no separate TTS-phase here, it's happening internally within GPT-4o, in the Realtime API and Advanced Voice.

  • willsmith72 4 hours ago ago

    That was cool, but got up to $1 usage real quick

    • russ 4 hours ago ago

      We had our playground (https://playground.livekit.io) up for a few days using our key. Def racked up a $$$$ bill!

      • wordpad25 2 hours ago ago

        How much is it per minute of talking?

        • russ 2 hours ago ago

          50% human speaking at $0.06/minute of tokens

          50% AI speaking at $0.24/minute of tokens

          we (LiveKit Cloud) charge ~$0.0005/minute for each participant (in this case there would be 2)

          So blended is $0.151/minute

        • shayps 2 hours ago ago

          It shakes out to around $0.15 per minute for an average conversation. If history is a guide though, this will get a lot cheaper pretty quickly.

          • cdolan 26 minutes ago ago

            This is cheaper than old cellular calls, inflation adjusted