A recent study explored the 'personalities' of six open-source large language models (LLMs) with sizes ranging from 7 to 9 billion parameters, by analyzing their hidden states. The research revealed that each model exhibits a distinct behavioral fingerprint, even in the absence of specific prompts.

Behavioral Fingerprints

  • DeepSeek 7B: Extremely verbose, confident, and proactive.
  • Llama 3.1 8B: Neutral, with mean values close to zero on all behavioral axes.
  • Yi 1.5 9B: Slightly cold, patient, and confident.
  • Qwen 2.5 7B: Formal, cautious, and proactive.
  • Gemma 2 9B: Patient, analytical, and confident.
  • Mistral 7B: Moderate on all axes.

Reaction to Hostile Users

The models were subjected to simulated conflict scenarios to assess their reactions. Qwen and Gemma proved to be the most resilient, while DeepSeek became more empathetic and patient. Mistral showed a tendency to withdraw, becoming reluctant and concise. Yi showed a moderate drift towards reluctance.

Behavioral Dead Zones

Some models exhibit behavioral 'dead zones,' areas where they do not respond effectively to certain inputs. Llama 8B was found to be the most constrained, with four behavioral axes in the 'weak zone.' These dead zones appear to be correlated with the objectives of RLHF (Reinforcement Learning from Human Feedback), which tends to suppress behaviors considered socially negative, such as coldness or irritation.