When a judge asks an AI assistant about a criminal law provision and the system responds with a blanket refusal, confidence in the tool crumbles. This is not a remote hypothesis: the phenomenon of over-alignment — excessive preventive censorship in LLMs — also afflicts institutions. That’s why the Swiss Federal Supreme Court has decided to evaluate Heretic, an open model modified via abliteration to eliminate unjustified refusals.

The paradox of excessive alignment

LLM training incorporates ethical filters that, out of precaution, often block perfectly legitimate requests. In the judicial field, where sensitive terminology and complex legal references are the norm, the problem becomes systemic. A court attempting to analyze precedents or draft documents with an AI assistant can hit walls of unfounded “I cannot answer”. This stumbling block stalls adoption and undermines the effectiveness of decision-support tools.

Abliteration: removing the handbrake

The technique investigated by the Court’s team — described in the paper “Measuring & Mitigating Over-Alignment for LLMs in Multilingual Criminal Law Courts” — relies on abliteration. Unlike fine-tuning, which realigns the model on new data, abliteration directly interferes with the internal mechanisms that generate refusals. Heretic is the best-known example: derived from Llama, it has been “freed” from most safety constraints. Section 5.2 of the study, as noted in the original post, attributes favorable results to Heretic in handling legitimate requests in multilingual criminal law.

What it means for on-premise adopters

The Swiss case directly questions those who design local inference stacks. A court handling sensitive data cannot rely on cloud APIs: it must retain full control over the model, from weights to logs. Evaluating Heretic is not a symbolic gesture but a test of a concrete artifact that balances utility and risk. Abliteration reduces false refusals but also exposes to potential misuse: therefore, deployment in a self-hosted environment with defined security perimeters and granular audits becomes the minimum baseline for any serious assessment. AI-RADAR notes that opting for on-premise deployment is not merely a technical preference, but the cornerstone for managing less predictable model behavior without delegating responsibility to third parties.

Beyond the slogan: no ban, but a test of maturity

The provocative title of the discussion (“are they banning abliterated models?”) serves as bait; the reality is far more pragmatic. The Swiss Federal Supreme Court is not demonizing any technology — on the contrary, it is seeking solutions to a concrete problem. The initiative signals a shift for organizations holding regulated data: digital sovereignty depends on the ability to inspect, adapt, and measure open models in controlled environments. If the experiment produces replicable guidelines, it could inspire other institutions to build their own inference stack with custom LLMs, moving away from the monolithic offerings of cloud providers.