SolidStart - Hacker News

EgoIncarnate a year ago ago

not "an H200", "In the table above, tensor parallelism is compared to pipeline parallelism with each across eight GPUs"

FanaHOVA a year ago ago

Title on HN is wrong. The article says GPUs and it's referring to one of their 8xH200 boxes.

7e a year ago ago

And this is why nobody submits MLPerf against NVIDIA.

greenknight a year ago ago

Its weird, i looked up whether AMD has any benchmarks on the 405B for the MI300x, and came across this one -- https://dstack.ai/blog/amd-mi300x-inference-benchmark/#token...

From my understanding, it can get up to around 2500 tokens/s? Both are 8x units (h200 and MI300x)

moondistance a year ago ago

Significant further optimizations. FP8!

Llama 405B 506 tokens/second on an H200