Notable AI Models

(epochai.org)

21 points | by cheptsov 12 hours ago ago

6 comments

  • slowmovintarget 11 hours ago ago

    Key takeaways:

    - The training compute of notable AI models is doubling roughly every six months.

    - Training compute costs are doubling every nine months for the largest AI models.

    - Training compute has scaled up faster for language than vision.

    - The size of datasets used to train language models doubles approximately every eight months.

    - The length of time spent training notable models is growing.

    - The power required to train frontier AI models is doubling annually.

    - Leading AI companies have hundreds of thousands of cutting-edge AI chips.

    Those are the conclusions for each section with some explanation for the data in each section. The implications seem to be that production of these LLMs is getting ever more costly, especially in terms of time and energy, even as training and algorithms become more efficient and more effective.

    Is this all supply-side with the assumption of demand, or is the demand curve already rising to match?

    • popalchemist 11 hours ago ago

      The trends around compute increasing is on the whole correct but by no means is it a universal rule. More optimized training procedures and model architectures are coming out all the time. In just the last week, we got F5-TTS which is trained on twice as much data as the previous leader in realistic TTS (Tortoise TTS) and is exponentially faster - taking only 3 weeks on H-100. We also got Meissonic, a text-to-image model that is exponentially easier to train than any existing model. IE you can train a Stable Diffusion like model from scratch on consumer hardware or in the cloud for abou $500.

      https://huggingface.co/MeissonFlow/Meissonic

      https://github.com/SWivid/F5-TTS

      The reason the trend is that compute costs are doubling is because this is an arms race and everyone in the corporate space is prioritizing bigger models over better architecture in the pursuit of a breakaway. It is not indicative of a law ala Moore's Law.

      • cheptsov 10 hours ago ago

        Yes, to me, the report is interesting not because of the insights it offers, but because it raises more questions:

        1. What about new AI chip vendors?

        2. How will the price of compute change?

        3. How will the demand for compute change?

        4. How will the overall supply of chips change?

  • graposaymaname 11 hours ago ago

    Damn, by the year 2028 (median, predicted based on current trends) all human generated content would be used to train the models. Baffles me, considering how cold the AI winter really was, even a few years back.

    • slowmovintarget 10 hours ago ago

      Breakthroughs have to break through something. Transformer architecture broke through that particular metaphorical ice, I suppose.

  • 11 hours ago ago
    [deleted]