Locating and Preventing Stereotypes in Large Language Models
A recent study investigates the internal mechanisms of LLMs like GPT 2 Small and Llama 3.2 to locate stereotypes. The research explores identifying specific neuronal activations and "attention heads" that contribute to biased outputs. The goal is to ...