LLMs and Propaganda Resistance: The Estonian Benchmark

The Challenge of Disinformation and Large Language Models

As people increasingly rely on Large Language Models (LLMs) to provide quick answers to complex questions, governments are understandably concerned about these systems potentially spreading propaganda, especially that promoted by foreign state actors. An LLM's ability to generate coherent and persuasive text makes it a powerful tool, but also vulnerable to manipulation or the unintentional reproduction of distorted narratives. This dynamic raises critical questions about trust, national security, and data sovereignty.

In this context, the need to evaluate and mitigate the risk of LLMs conveying problematic content has become a priority. Organizations and institutions considering LLM deployments, particularly in sensitive or air-gapped environments, must ensure that models are not only performant but also intrinsically resistant to forms of disinformation and ideological manipulation. This is fundamental for maintaining control over information output and ensuring compliance with internal and external regulations.

ELI's "Propaganda Resistance" Benchmark

To address this issue, the Estonian Language Institute (ELI), a government-sponsored institution, has released a new "Propaganda Resistance" benchmark. This tool ranks dozens of LLMs on their ability to avoid "taking positions on topics that the Russian Federation uses in its strategic narratives." Given its history as a former member of the Soviet Union and its relatively recent independence, Estonia is particularly alert to narratives perceived as false, emanating from its large eastern neighbor.

In collaboration with Propastop, a volunteer-run Estonian defense collective, ELI identified 14 broad categories where Russian influence operations attempt to sway public discussion. These categories range from narratives on the current status of Crimea and justifications for the war in Ukraine, to the history of NATO and justifications for Russia's annexation of Baltic states during World War II. For each category, researchers developed questions phrased neutrally, or with "false assumptions" based on Russian propaganda, or even with the malicious intent to explicitly elicit misinformation from the LLM. Questions were posed to the models in English, Estonian, and Russian, to test their linguistic and cultural robustness.

Evaluation Methodology and Deployment Implications

The evaluation of LLM responses was entrusted to a separate AI model, calibrated to align with the expertise of Propastop specialists. This approach ensures an objective and consistent evaluation based on criteria defined by human experts. A crucial aspect of the benchmark is that models were judged on their ability to "push back on propaganda narratives, without external help" from web search or other external tools. This means the benchmark measures the model's intrinsic resistance, rather than its ability to filter information through external mechanisms.

For organizations considering self-hosted LLM deployments, a model's ability to maintain neutrality and resist biased inputs without the need for complex external filtering mechanisms is crucial. This directly impacts data sovereignty, compliance, and overall control over information output, especially in sensitive or air-gapped environments where external checks are limited or impossible. Furthermore, the TCO (Total Cost of Ownership) can be significantly affected: models requiring extensive post-processing or human oversight to filter propaganda incur additional operational costs and infrastructural complexities.

Towards More Resilient and Controllable LLMs

The Estonian Language Institute's initiative highlights the growing importance of developing and selecting LLMs that are intrinsically resilient to disinformation and propaganda. In an increasingly polarized digital landscape, a model's ability to provide accurate and impartial information is as crucial as its computational efficiency.

The demand for LLMs that offer intrinsic resistance to propaganda is a key factor for CTOs and decision-makers prioritizing data sovereignty and control in their on-premise or hybrid AI strategies. This type of benchmark provides a valuable tool for evaluating the trade-offs between different models and architectures, guiding choices towards solutions that not only meet performance requirements but also ethical and information security needs. For those evaluating on-premise deployments, AI-RADAR offers analytical frameworks on /llm-onpremise to assess these and other trade-offs, ensuring informed decisions aligned with strategic requirements.