Show HN: Loft CLI – Fine-tune and run LLMs (1–3B) on 8 GB MacBook Air, no GPUs

2 points | by dips2umar 21 hours ago ago

1 comments

dips2umar 20 hours ago ago
Author here — happy to answer any questions!
One thing that surprised us: on an 8 GB M2 Air, peak RAM never exceeded 330 MB during a full 300-sample finetune (2 epochs) — thanks to gradient checkpointing, which reduces memory usage by recomputing activations instead of storing them.
If anyone tries LoFT on Windows or Linux, I’d love to hear your first-token latency with `loft chat`. On macOS we see ~145 ms/token with TinyLlama + GGUF.