I tried backing out proprietary model sizes from benchmark scores, inspired by a Latent Space podcast where Artificial Analysis noted their Omniscience Accuracy numbers track parameter count better than anything else they measure.
I trained a bunch of simple linear regressions - while Omniscience Accuracy had the best fit (R2: 0.98), it predicted absurd multi‑trillion param sizes (Gemini 3 Pro ~1,254T total parameters). Artificial Analysis' Intelligence Index provided more plausible results:
Gemini 3 Pro: 3.4T
Claude 4.5 Sonnet: 1.4T
Claude 4.5 Opus: 4.1T
GPT-5.x series in 2.9-5.3T range total parameters.
Interesting notes:
- task benchmarks (Tau²/GDPVal) aren't predictive of model size
- adding price made the fit worse
- sparsity or parameter activation ratios did not influence predicted sizes at all.
I tried backing out proprietary model sizes from benchmark scores, inspired by a Latent Space podcast where Artificial Analysis noted their Omniscience Accuracy numbers track parameter count better than anything else they measure.
I trained a bunch of simple linear regressions - while Omniscience Accuracy had the best fit (R2: 0.98), it predicted absurd multi‑trillion param sizes (Gemini 3 Pro ~1,254T total parameters). Artificial Analysis' Intelligence Index provided more plausible results:
Gemini 3 Pro: 3.4T Claude 4.5 Sonnet: 1.4T Claude 4.5 Opus: 4.1T GPT-5.x series in 2.9-5.3T range total parameters.
Interesting notes:
- task benchmarks (Tau²/GDPVal) aren't predictive of model size - adding price made the fit worse - sparsity or parameter activation ratios did not influence predicted sizes at all.