Gemini 1.5 Flash-8B is now production ready

(developers.googleblog.com)

27 points | by srameshc 9 months ago ago

13 comments

  • bearjaws 9 months ago ago

    Damn I literally just published a article benchmarking flash-1.5 and showing it is very impressive for it's cost.

    https://myswamp.substack.com/p/improving-accessibility-using...

    Maybe I'll redo it and add in 1.5-8b, it's so cheap it doesn't hurt to add it lol.

  • Alifatisk 9 months ago ago

    Why do some people turn to Gemini? I've tried it, and I remember it lacking or being heavily censored. Is it because it's cheap? Or is it better at some tasks that others aren't?

    • druskacik 9 months ago ago

      Mainly for its long context abilities. The other major LLMs peak at 128-256K context windows, Gemini models promise 1 million (2 million in case of 1.5 Pro).

      • Alifatisk 9 months ago ago

        Oh, you're right! I was about to ask how because gemini.google.com never allowed me, but I guess it's accessible through aistudio.google.com.

        But I do wonder, how well does Gemini 1.5 Pro / Flash recall from the context window? For example, when both chatGPT and Claude allowed for 8k context window, Claude was still way far ahead with recalling what you've said compared to chatGPT which tend to forget tokens after a while, so you had to remind it.

        • druskacik 9 months ago ago

          Yes, aistudio.google.com is the way to go! You can upload some long documents such as books and try the long context. And the AI Studio is actually free, not many people know about this.

          As of the recall performance, I can't really speak from my experience, you should try yourself :)

          • Alifatisk 9 months ago ago

            I actually stayed away from AI studio (until I completely forgot about it) because I taught it was for enterprise or paying customers only. I'll definitely try it out, thank you!

  • faangguyindia 9 months ago ago

    It's such a shame, zed editor cannot use Gemini Flash for code completion, it's stuck on Supermaven or copilot.

    Most editors can easily support LLMs via Fill in Middle operation mode

  • Havoc 9 months ago ago

    Does anyone know if the rate limits on Flash and Flash8B are separate?

    • Alifatisk 9 months ago ago

      It's on the bottom

      > To make this model as useful as we can, we are doubling the 1.5 Flash-8B rate limits, meaning developers can send up to 4,000 requests per minute (RPM).

      You can even compare the late limit here https://ai.google.dev/pricing

      • Havoc 9 months ago ago

        Thanks

  • 9 months ago ago
    [deleted]