2D Early Exit Optimization: New Horizons for On-Premise LLM Inference
A two-dimensional early exit strategy revolutionizes LLM inference by coordinating layer-wise and sentence-wise exiting. This incremental method generates multiplicative computational savings, surpassing single optimizations. Tested on 3B-8B paramete...