The implementation of more powerful large language models (LLMs) is leading to mixed results in the chatbot under development for the UK government. According to the Government Digital Service (GDS), there is a significant increase in accuracy, but also an increase in response times.

Accuracy vs. Latency

Public tests have shown a leap forward in the accuracy of deliveries, which has increased from 76% to 90%. This improvement is directly attributable to the use of more sophisticated LLMs. On the other hand, users must now wait about 11 seconds to get an answer, a fact that raises questions about the user experience.

Deployment Considerations

For those evaluating on-premise deployments, there are trade-offs between performance and costs. AI-RADAR offers analytical frameworks on /llm-onpremise to evaluate these aspects.