Introduction

Large Language Models, such as those developed by Meta, have been subject to a new evaluation that puts their epistemic robustness to the test. The new protocol, called Drill-Down and Fabricate Test (DDFT), measures the ability of models to maintain factual accuracy on semantic grounds when under stress.

Results

The results of the test revealed that epistemic robustness is orthogonal to conventional design paradigms. However, error detection capability was found to be a strong predictor of overall robustness.

Conclusion

The results of the test showed that Large Language Models can be brittle despite their scale, challenging assumptions about the relationship between model size and reliability.

Implications

The new protocol provides both theoretical foundation and practical tools for assessing epistemic robustness before deployment in critical applications.