QuarterBit – Train 70B LLMs on a single GPU

(quarterbit.dev)

3 points | by quarterbit 11 hours ago ago

3 comments

  • quarterbit 11 hours ago ago

    I built QuarterBit because AI training costs are insane. A 70B model needs 840GB of memory — that's 11 A100 GPUs at $30+/hour.

    QuarterBit AXIOM compresses training memory 15x. Same model. Same quality. Fraction of the hardware.

    RESULTS:

      Llama 70B: 840GB → 53GB (11 GPUs → 1 GPU) = 90% savings
      Llama 13B: 156GB → 9GB (FREE on Kaggle T4) = 100% savings
    
    91% energy reduction vs standard training. 100% trainable weights (not LoRA/adapters). 3 lines of code.

    HOW IT WORKS:

      from quarterbit import axiom
      model = axiom(model)
      model.cuda()
    
    TRY IT:

      pip install quarterbit
    
    Demo (FREE): https://www.kaggle.com/code/kyleclouthier/quarterbit-axiom-1...

    Benchmarks: https://quarterbit.dev

    AXIOM uses a novel weight representation combining lossless compression with a built-in optimizer. Weights stored at 0.62 bytes/param vs 4 bytes FP32. Gradient updates happen directly in compressed space.

    Not quantization-aware training or LoRA — every parameter fully trainable, convergence matches AdamW.

    Solo founder from Canada. Self-taught CUDA/ML. Applying to YC S26.

    Happy to answer questions.

    • smallerize 11 hours ago ago

      Put two spaces at the beginning of a line for monospace.

        Like this
      • quarterbit 10 hours ago ago

        Thanks for the tip! It won't let me edit it. I think I will hide and repost it does look sloppy.