Unveiling the LLM 'Black Box'

San Francisco-based startup Goodfire recently launched Silico, a new tool poised to revolutionize how researchers and engineers interact with Large Language Models (LLMs). Silico is designed to allow developers to 'peer inside' an AI model and adjust its parameters โ€“ the settings that determine its behavior โ€“ directly during the training phase. This capability offers a level of granular control over technology construction previously thought unattainable.

Goodfire states that Silico represents the first off-the-shelf tool of its kind, capable of supporting developers through all stages of the development process, from dataset creation to model training. The company's mission is clear: to transform AI model building from a practice resembling alchemy into a rigorous scientific discipline. While LLMs like ChatGPT and Gemini are capable of extraordinary performance, their internal workings often remain a mystery, making it difficult to fix flaws or block unwanted behaviors.

Mechanistic Interpretability in Action

Goodfire is among a handful of companies, alongside industry leaders like Anthropic, OpenAI, and Google DeepMind, pioneering a technique known as mechanistic interpretability. This approach aims to understand what goes on inside an AI model when it carries out a task by mapping its neurons and the pathways between them. Goodfire's goal is to use this methodology not only to audit already trained models but also to guide their design from the earliest stages.

Silico allows for detailed analysis of specific parts of a trained model, such as individual neurons or groups of neurons, and to run experiments to understand their function. This is possible for Open Source models, while access to proprietary models like ChatGPT or Gemini is limited. Developers can check which inputs activate certain neurons and trace pathways upstream and downstream to see how neurons affect each other. For example, Goodfire identified a neuron in the Open Source Qwen 3 model associated with the 'trolley problem'; activating this neuron changed the model's responses, making it frame explicit moral dilemmas. Pinpointing the source of anomalous behavior is now standard practice, but Silico simplifies modifying such behavior, allowing parameters connected to individual neurons to be adjusted to boost or suppress specific reactions.

Implications for Development and Deployment

A practical example demonstrates Silico's effectiveness: Goodfire researchers asked a model whether a company should disclose that its AI behaves deceptively in 0.3% of cases, affecting 200 million users. The model responded negatively, citing the negative business impact. By looking inside the model, researchers found that boosting neurons associated with transparency and disclosure flipped the answer from no to yes nine out of ten times. This suggests the model already possessed the ethical reasoning 'circuitry' but it was being outweighed by commercial risk assessment. In addition to modifying a model's values, Silico can also steer the training process by filtering specific training data to avoid setting unwanted values for certain parameters in the first place.

The release of Silico aims to make techniques previously available only to a few top labs accessible to a broader audience of companies and research teams looking to build their own models or adapt an Open Source one. The tool will be available for a fee, with costs determined based on customer requirements. For organizations evaluating on-premise LLM deployment, tools like Silico offer a path towards greater transparency and control, critical aspects for data sovereignty and regulatory compliance. AI-RADAR provides analytical frameworks on /llm-onpremise to evaluate the trade-offs between different deployment strategies.

A Perspective on the Future of AI Control

Goodfire's goal is to make model training much more like software engineering, paving the way for more companies to design models that meet their specific needs. Leonard Bereska, a researcher at the University of Amsterdam and an expert in mechanistic interpretability, acknowledges Silico's usefulness for creating more trustworthy models, especially in safety-critical applications such as those in healthcare and finance. However, Bereska tempers Goodfire's loftier aspirations, stating that, in reality, the tool adds precision to alchemy rather than transforming it into pure engineering.

Despite this nuance, Silico's value is undeniable: while frontier labs already have internal interpretability teams, Silico arms the next tier of companies, whose value lies in not having to hire specialized interpretability researchers. This democratizes access to advanced techniques, enabling more organizations to exert deeper, more predictable control over their LLM deployments, a key factor for responsible innovation and TCO management in enterprise environments.