Teaching an LLM a Niche Diagraming Language

(huy.rocks)

26 points | by todsacerdoti 12 hours ago ago

3 comments

  • robot-wrangler 3 hours ago ago

    Big thank you to author and OP. This is exactly the kind of homebrew recipe post I've been waiting for. I knew it had to be basically cookbook by now but really simple examples like this with no fluff are surprisingly hard to find. (Anyone got others?)

    I've been thinking about similar experiments with some obscure esolang for a long time, so more detail on total time/cost would be nice. Also.. if it's correct that this size model is about the right minimal choice for starting such efforts.. what are the next steps if you wanted to shrink it to only specialize in the target? Should you go for distillation or ablation?

    • huydotnet 2 hours ago ago

      Hey, I'm the author of the post. Thank you so much for the kind feedback!

      Speaking about total time/cost, this experiment cost me just $1.01 for 2h30 on a rental GPU. But the actual successful run was less than 10 minutes for both phases. The rest of the time I was spending fixing the code, tuning the params, train, and retrain. It took me about 6 hours to build and clean the two datasets, though.

      For the next step, I'm thinking of improving the model accuracy, maybe with RL, but I would not go about shrinking the model size any lower. Prior to this, I've tried a lot of different model sizes on different kinds of tasks, from 135M to 4B. I'm not sure I like the performance of these small models for code generation :D

  • thomascountz 7 hours ago ago

       ...I heard many good and bad things about [using RL for training] and I must give it a try.
    
    Great article and great ethos. Thanks for sharing! I had no idea how LLM worked before and now I know a bit more.