SolidStart - Hacker News

throwa356262 a day ago ago

Since I quit my Claude subscription, every month I spend $20 (the cost of CC pro plan) playing around with new models and new providers.

Currently testing M3 for agentic tasks. It works OK and their token plan is very cheap. Highly recommend for claw / hermes type of work.

Tested GLM 5.1 for coding last month and it burned through my tokens a bit too quickly, but it worked well enough.

[-]

LUmBULtERA a day ago ago

I've been testing M3 for agentic tasks on Hermes and it just gets way too confused. I have really poor result from it compared to GPT-5.4 mini/regular or GLM-5.2 (and even 5.1).

[-]

stevenhubertron a day ago ago

This has been my experience as well to the letter.

[-]

ricardobeat a day ago ago

M3 works best as a 'worker' agent. Create plans with a smarter model (Opus, K2.7, DeepSeek Pro) then use Minimax to execute.

adrian_b 2 days ago ago

The comparison results seem very plausible.

From the conclusion, I agree with:

> I wouldn't make either one the top-level coordinator by default.

But I do not agree with the follow-up sentence:

> The best shape is still a frontier coordinator or judge above them: GPT-5.5 or Claude Opus deciding what to delegate, checking the finished work, and rerunning narrow pieces when the answer looks wrong. These models make the worker layer much more serious, not the coordinator layer unnecessary.

For the coordinator or judge above them I would put myself, not a too expensive LLM under the control of an external entity, achieving thus simultaneously higher quality, lower cost and greater security.

[-]

throwa356262 2 days ago ago

A lot of LLM discussions is driven by people who cannot code themselves.

There are multiple AI influencers on youtube who can't code 5 lines of python to save their lives. But they do own 3 DGX spark and a stack of maxed out mac minis...

(Not complaining, AI is supposed to be democratic)

incrudible a day ago ago

> For the coordinator or judge above them I would put myself

You will not be able to keep up with the sheer volume, or alternatively you're never gonna ingest as much information as the LLM, so you're gonna miss out. Input tokens are relatively cheap.

Think of yourself as the CTO, they can't possibly make a judgement call on every detail, but an LLM can, and if you're gonna let an LLM do that, might as well go with frontier, and if you're not gonna let an LLM do that, you're stuck with whatever the lower-tier LLMs provided you with.

That doesn't mean you shouldn't read or judge the code at all, but you're still gonna want to use the LLM as the lever.

[-]

halJordan 17 hours ago ago

Yeah, the comment you're responding doesn't understand the workflow being discussed. And of course that makes the person believe they're genius level on the topic

scottchiefbaker 2 days ago ago

FWIW Opencode Go is giving 3x MiniMax M3 access right now. According to their chart you get almost 10x as much access to MM3 vs GLM 5.2.

Considering how close the models are, the extra free queries may be worth it.

[-]

oceanwaves 2 days ago ago

Yes, that's what I'm finding too. There seems to be a concerted promotional pricing campaign tied to M3's release across providers. Since their differences are subtle, it makes a lot of sense to fan-out to M3.

dchftcs a day ago ago

>I'm comfortable calling MiniMax the more eager model in this set because that claim is backed by the artifacts, not by vibe. It repeatedly reached for locks, persistence, policy objects, fallback paths, decorators, and extensible strategy shapes

What are "extensible strategy shapes" for those who don't speak LLM?

ValentineC a day ago ago

Any comparison that brings "frontier-like" claims should have benchmarks against Opus 4.8 (or Fable 5 once we get it back) and GPT 5.5.

Spoiler alert: this article just says that GLM 5.2 is better in quality than MiniMax M3, but worse value for money.

oceanwaves 2 days ago ago

GLM 5.2 edges as the safer pick when tasks are more challenging from-scratch builds and the result needs to arrive as a complete, runnable project. MiniMax M3 is the value pick for a lot of worker traffic.

[-]

ashenke 2 days ago ago

I'd love to see a comparison with both Deepseek v4 models as well

[-]

stevenhubertron a day ago ago

There are some out there I couldn’t find one quickly, but deep seek reasons way too much

killingtime74 2 days ago ago

I've used both and they are great. Would be better to have a GPT or Opus benchmark

[-]

a day ago ago

[deleted]

Havoc 20 hours ago ago

5.2 is noticeably more security aware than 5.1

5.1 was happy to log in to a server that has kubectl access to check out why my k8s isn't doing the k8s thing. 5.2 just straight up says nope can't use those credentials that's unsafe.

Can't say I'm stoked about this handholding trajectory of LLMs. Yes yes security, but you're on a local network and all these VMs will get nuked shortly anyway

[-]

halJordan 17 hours ago ago

It's actually crazy to think about. People have free access to alcohol and guns and cars because that's the legacy. Yet we all trust everyone with all those things. But God forbid we trust someone with an llm.

mt42or a day ago ago

All software benchmark are bullshit currently because none mesure capacity of doing same tasks after 1000 first warmed commit of random stuff. It's always easier to build something from scratch but nobody rebuild their feature from 0 every day.

MiniMax M3 vs. GLM 5.2: Codegen comparison across autonomous coding tasks