1 comments

  • dips2umar 20 hours ago ago

    Author here — happy to answer any questions!

    One thing that surprised us: on an 8 GB M2 Air, peak RAM never exceeded 330 MB during a full 300-sample finetune (2 epochs) — thanks to gradient checkpointing, which reduces memory usage by recomputing activations instead of storing them.

    If anyone tries LoFT on Windows or Linux, I’d love to hear your first-token latency with `loft chat`. On macOS we see ~145 ms/token with TinyLlama + GGUF.