CookLLM – Learn LLM internals by building one from scratch

(cookllm.com)

1 points | by SiliconGen 3 hours ago ago

1 comments

SiliconGen 3 hours ago ago

I'm Harry, a former CV engineer who switched to LLMs. After months of wading through scattered blog posts, dense papers, and tutorials that skip the hard parts, I decided to build the course I wished existed.

  CookLLM is a hands-on LLM engineering course where you build everything from                                                                                                                                                             
  scratch — tokenizer (BPE in Rust), model architecture, GPU kernels                                                                                                                                                                       
  (CUDA/Triton), Flash Attention, pretraining pipeline, and eventually SFT/RLHF.                                                                                                                                                           
                                                                                                                                                                                                                                                                                                                                                                                                                                               
                                                                                                                                                                                                                                           
  Currently ~40% complete. Topics already shipped: tokenization, RoPE, attention,                                                                                                                                                          
  Flash Attention (6 chapters), GPU programming, BentoLM architecture, and the                                                                                                                                                             
  full pretrain pipeline. Coming next: training parallelism, modern architectures                                                                                                                                                          
  (RMSNorm, SwiGLU, Muon optimizer), and post-training (SFT, DPO, GRPO).