Bringing Up DeepSeek-V4-Flash on AMD MI300X

(fergusfinn.com)

117 points | by kkm a day ago ago

25 comments

  • maCDzP a day ago ago

    I train on AMD MI250X and managed to get Gemma 4 31B to work - but it took a lot of work on the software side.

    • kkm a day ago ago

      This is very interesting, planning to write about it?

  • kkm a day ago ago

    Also the vllm patch accompanying the blogpost: https://github.com/doublewordai/vllm-amd-blog-doubleword

  • mezark a day ago ago

    We at doubleword are bullish for AMD for low-interactivity inference - it does just take a bigger lift on the software side...

    • brcmthrowaway a day ago ago

      Are you long AMD?

      • latchkey a day ago ago

        Interesting that you ask that as AMD hits another ATH.

        • boxking a day ago ago

          hello,sir, I want bulk order Asrock BC-250, is it still available ?

          • latchkey a day ago ago

            lol find the discord!

            • boxking 21 hours ago ago

              yes,sir, any possibility to find 1000pcs or more

              • latchkey 21 hours ago ago

                They are all over ebay. https://www.ebay.com/sch/i.html?_nkw=bc-250

                I'm super curious what you would use them for.

                • boxking 21 hours ago ago

                  now many guys want to buy this, I am reseller AMD BC-250,it is popular now

                  • boxking 20 hours ago ago

                    thank you sir, actually it is not easy to find on ebay, because they are small seller, hard to find hundred, also ebay price a bit high for me

        • brcmthrowaway a day ago ago

          Then you are definitely long on AMD.

          • latchkey a day ago ago

            More accurately... I'm long on a viable alternative to the current monopoly. We have two OS's for phones (android and ios), there is no reason why we shouldn't have the same for all AI hardware and software. The only one even close, is AMD.

    • keynha a day ago ago

      [dead]

  • edg5000 a day ago ago

    Checked out this company about a year ago and they only offered small models. Now I see they have GLM-fp8/Kimi and DeepSeek V4 Pro. Since workloads are predominantly cached input, I'm surprised to see no separate price for cached input vs uncached. I hope the prices will drop significantly; with these prices you'll end up with thousands in monthly costs quickly. Hopefully more hardware companies will be on the market in the coming years. If the Chinese eventually start competing with the current memory makers, maybe that will help.

    • mezark 9 hours ago ago

      Hi! Co-founder of Doubleword here - we've hugely increased the number of models that we offer (partly thanks to work that we've done on hotswapping https://blog.doubleword.ai/fast-sglang-starts.

      We're kind of known for our low prices - our prices (our main usage is for our high throughput API - the async tier) is significantly below average openrouter prices - but cached prices is coming soon which will lower them even more :)

    • zftnb666 a day ago ago

      [flagged]

  • benlm a day ago ago

    Nice work! Would DeepSeek V4 Pro on 8xMI300X work with these patches?

    • mezark 9 hours ago ago

      we think so - but haven't tested it ourselves

  • latchkey a day ago ago

    Nice work and thanks for being a customer.

    (CEO Hot Aisle)

  • alfiedotwtf 20 hours ago ago

    It’s just weird Deepseek released a model that was not compatible with any of the usual engines. Without derez’s new project just to support DSv4, how long until it’s actually viable in llama :(

  • jing09928 a day ago ago

    [flagged]