SolidStart - Hacker News

bearjaws a year ago ago

Damn I literally just published a article benchmarking flash-1.5 and showing it is very impressive for it's cost.

https://myswamp.substack.com/p/improving-accessibility-using...

Maybe I'll redo it and add in 1.5-8b, it's so cheap it doesn't hurt to add it lol.

[-]

YetAnotherNick a year ago ago

Can you also include gpt-4o-mini.

[-]

bearjaws a year ago ago

I made a note with the updated chart:

https://substack.com/profile/107132439-michael-barajas/note/...

Alifatisk a year ago ago

Why do some people turn to Gemini? I've tried it, and I remember it lacking or being heavily censored. Is it because it's cheap? Or is it better at some tasks that others aren't?

[-]

druskacik a year ago ago

Mainly for its long context abilities. The other major LLMs peak at 128-256K context windows, Gemini models promise 1 million (2 million in case of 1.5 Pro).

[-]

Alifatisk a year ago ago

Oh, you're right! I was about to ask how because gemini.google.com never allowed me, but I guess it's accessible through aistudio.google.com.

But I do wonder, how well does Gemini 1.5 Pro / Flash recall from the context window? For example, when both chatGPT and Claude allowed for 8k context window, Claude was still way far ahead with recalling what you've said compared to chatGPT which tend to forget tokens after a while, so you had to remind it.

[-]

druskacik a year ago ago

Yes, aistudio.google.com is the way to go! You can upload some long documents such as books and try the long context. And the AI Studio is actually free, not many people know about this.

As of the recall performance, I can't really speak from my experience, you should try yourself :)

[-]

Alifatisk a year ago ago

I actually stayed away from AI studio (until I completely forgot about it) because I taught it was for enterprise or paying customers only. I'll definitely try it out, thank you!

faangguyindia a year ago ago

It's such a shame, zed editor cannot use Gemini Flash for code completion, it's stuck on Supermaven or copilot.

Most editors can easily support LLMs via Fill in Middle operation mode

Havoc a year ago ago

Does anyone know if the rate limits on Flash and Flash8B are separate?

[-]

Alifatisk a year ago ago

It's on the bottom

> To make this model as useful as we can, we are doubling the 1.5 Flash-8B rate limits, meaning developers can send up to 4,000 requests per minute (RPM).

You can even compare the late limit here https://ai.google.dev/pricing

[-]

Havoc a year ago ago

Thanks

a year ago ago

[deleted]

Gemini 1.5 Flash-8B is now production ready