SolidStart - Hacker News

tmanchester a day ago ago

Okay this is actually pretty cool. Gemma 4 is a nice little model and I've really enjoyed playing around with it. At 1800 tok/s turns are essentially instant, it's a bit of a trip

simianwords 16 hours ago ago

I just tried it on their website and it is extremely fast. I wonder what is the value prop of this? Where would I want

1. a smaller model

2. also non local, hosted on cloud

I can't think of any case.

[-]

johntash 12 hours ago ago

OCR is a decent use-case for smaller models. I've had good experience using gemma for OCR'ing handwritten stuff that tesseract doesn't do so well on.

But for 2, probably only useful if you have a huge batch workload you want to get done quicker and don't want the local hardware for it?

jamesponddotco 12 hours ago ago

A voice assistant comes to mind. Ideally, it'd be local, but if you don't have the hardware you'll go with the cloud, in which case, the fastest, the better.

anthonypasq 16 hours ago ago

speed is always better. if you have ever used a coding agent with 1000 tps going back to 50 seems like walking through sludge. for simple question i hate waiting 2 minutes for opus to loop 50 times just to read some files and answer a question.

its not necessarily specifically labout gemma 4, but in a year or 2 when we have opus class models at 2000 tps imagine the productivity.

[-]

simianwords 15 hours ago ago

Of course I think speed is preferable but I don’t see myself paying for a fast Gemma

[-]

anthonypasq 15 hours ago ago

i mean, i can imagine a million different apps that use ai that want cheap multimodal capabilities with high latency.

simianwords 16 hours ago ago

Answering myself: fancy autocomplete in my IDE?

Text autocorrect on my phone? Like give it all the context about me and so on.

keynha 2 hours ago ago

[dead]

Krishnaswaroop a day ago ago

[flagged]

Gemma 4 on Cerebras - The Fastest Inference Is Now Multimodal