SolidStart - Hacker News

20 points | by TheMrZZ 2 hours ago ago

6 comments

arm32 2 hours ago ago
The title got me, I'll admit it—except that the benchmark is a game where the models are told to lie.
[-]
bellowsgulch 2 hours ago ago
I find it deeply funny and I suppose a bit expected that a Grok model appears at face value to be optimized for supposed truth telling.
And to keep the e-mob off my back, I don't endorse Elon Musk.