1 article on q-lora.
Q-LoRA + SFTTrainer + Flash Attention v2 means you can fine-tune a 70B parameter model on 24GB of VRAM. Here is what that actually looks like end-to-end, what it costs in quality, and when you should just use the API instead.