LLM Personas: Why Fine-tuning and Steering Aren't the Same Thing

Anyone running LLMs in production knows there are three ways to shape tone and style: conditioning through the prompt, fine-tuning on targeted examples, or intervening directly on activation vectors at inference time (steering). A common assumption, inherited from the persona-vector literature, is that all three routes lead to the same destination – the same direction in latent space should encode the same personality trait.

A new theoretical study picks this belief apart. Taking aim at Beckmann and Butlin's (2026) ontological framework for the LLM individuation problem, the authors ran persona-topology experiments on two concrete models – Qwen3-4B-Instruct and Mistral-7B-Instruct-v0.2 – and uncovered four empirical inconsistencies.

First: vectors extracted via prompting are not collinear with the attractor basins produced by fine-tuning. In practical terms, if you nudge the model towards a friendly persona using a specific prompt, the resulting direction in latent space does not match the one that emerges when the same trait is learned through weight updates. Second: fictional personas push the model along directions associated with real-world anchors more strongly than the anchors themselves do, revealing a counterintuitive dynamic.

The third wedge involves mixtures of contradictory valences: when opposite traits are blended, the model gravitates toward an attractor determined by its training history, overriding the user's intended balance. Fourth and last: the compositional algebra of vectors is asymmetric. Merging two directions at inference time yields behavior different from what you get by training the model on a chimera built from the same components.

This leads to a proposed ontological revision: the identity of representational content is not given by the vector alone, but by the (vector, regime) pair. In other words, what we call an LLM's "personality" exists only within a specific regime – prompt, fine-tuning, or steering – and cannot be automatically transported from one to another. Beckmann and Butlin, as well as other philosophers of artificial mind (Mollo and Millière, Chalmers, Cerullo), would thus be describing three different internal objects, not three competing candidates for the same referent.

For those managing on-premise models, the stakes are concrete. Many organisations choose self-hosting precisely to have total control over model behavior, often combining enterprise fine-tuning with active guardrails at inference time. If the same persona is not guaranteed across regimes, system predictability drops, along with trust in safety measures. The paper doesn't offer immediate operational fixes, but it signals the need for cross-coherence testing whenever different personalisation methods are integrated – a warning that deployment pipelines would do well to heed.

LLM Personas: Why Fine-tuning and Steering Aren't the Same Thing

💻 Need GPU Cloud Infrastructure?

Stay ahead — get AI signals in your inbox

💬 Comments (0)

🔍 Continue Exploring

More in LLM

👥 Join 160+ AI explorers