Orthrus-Qwen3-8B: Up to 7.8x Acceleration for Large Language Models with Unchanged Accuracy
Orthrus-Qwen3-8B introduces an innovation for LLM inference, promising up to 7.8x acceleration compared to the base Qwen3-8B model, while maintaining the same output distribution. This approach, which freezes the model's backbone and introduces a dif...