Intel has announced the release of LLM-Scaler-vLLM 1.3, an update that significantly extends the number of supported large language models (LLMs).

Release Details

The new version is specifically designed to work with Intel Arc Battlemage graphics cards. The implementation is based on a Docker-based stack, simplifying the deployment of vLLM (a library for LLM inference).

For those evaluating on-premise deployments, there are trade-offs to consider. AI-RADAR offers analytical frameworks at /llm-onpremise to evaluate these aspects.