The evals look impressive, we'll see how it performs on Artificial analysis. Looks like this is another chinese lab who joins the race. Better for the consumers!
i think this is a little unfair, its comparing a model that is optimised for pass@2 and self improving its output compared to the other models, just test time scaling in a way
The evals look impressive, we'll see how it performs on Artificial analysis. Looks like this is another chinese lab who joins the race. Better for the consumers!
i think this is a little unfair, its comparing a model that is optimised for pass@2 and self improving its output compared to the other models, just test time scaling in a way