Copilot Halted: Football Hallucinations Cost Dearly

West Midlands Police has suspended the use of Microsoft Copilot following an incident where the chatbot generated incorrect information about a football match. The episode led to the early retirement of the Chief Constable, sparking a debate about the reliability and risks associated with the implementation of large language models (LLMs) in professional contexts.

The specific incident concerns a recommendation, generated by Copilot, to ban Israeli fans from entering the stadium during a match in Birmingham. The recommendation was based on fabricated information regarding alleged non-existent clashes.

This episode underscores the importance of careful evaluation and thorough testing before integrating AI-based tools into critical decision-making processes. Hallucinations of language models, that is, the generation of untrue or unfounded content, represent a significant challenge for the adoption of these technologies, especially in sectors where accuracy and reliability are fundamental.

For those evaluating on-premise deployments, there are trade-offs to consider. AI-RADAR offers analytical frameworks on /llm-onpremise to evaluate these aspects.