One thing that surprised us: on an 8 GB M2 Air, peak RAM never exceeded 330 MB during a full 300-sample finetune (2 epochs) — thanks to gradient checkpointing, which reduces memory usage by recomputing activations instead of storing them.
If anyone tries LoFT on Windows or Linux, I’d love to hear your first-token latency with `loft chat`. On macOS we see ~145 ms/token with TinyLlama + GGUF.
Author here — happy to answer any questions!
One thing that surprised us: on an 8 GB M2 Air, peak RAM never exceeded 330 MB during a full 300-sample finetune (2 epochs) — thanks to gradient checkpointing, which reduces memory usage by recomputing activations instead of storing them.
If anyone tries LoFT on Windows or Linux, I’d love to hear your first-token latency with `loft chat`. On macOS we see ~145 ms/token with TinyLlama + GGUF.