AdaFRUGAL: Optimized LLM Training

Training large language models (LLMs) is a highly resource-intensive task, mainly due to the memory overhead required by the optimizer state. A new framework, called AdaFRUGAL, aims to solve this problem through dynamic hyperparameter management.

AdaFRUGAL introduces two main dynamic controls:

  • A linear decay for the subspace ratio (ฯ), which progressively reduces the memory used.
  • A loss-aware schedule for the update frequency (T), which decreases computational overhead.

Experimental results, obtained on pre-training (English C4, Vietnamese VietVault) and fine-tuning (GLUE) datasets, demonstrate that AdaFRUGAL achieves an excellent trade-off between performance, GPU memory consumption, and training times. The framework proves competitive with AdamW and static FRUGAL, offering a more practical and autonomous solution for LLM training in resource-constrained contexts.

In summary, AdaFRUGAL represents a step forward towards more efficient and accessible LLM training, thanks to its ability to dynamically adapt to the needs of the learning process.