## Introduction Large Language Models, such as those developed by Meta, have been subject to a new evaluation that puts their epistemic robustness to the test. The new protocol, called Drill-Down and Fabricate Test (DDFT), measures the ability of models to maintain factual accuracy on semantic grounds when under stress. ## Results The results of the test revealed that epistemic robustness is orthogonal to conventional design paradigms. However, error detection capability was found to be a strong predictor of overall robustness. ## Conclusion The results of the test showed that Large Language Models can be brittle despite their scale, challenging assumptions about the relationship between model size and reliability. ## Implications The new protocol provides both theoretical foundation and practical tools for assessing epistemic robustness before deployment in critical applications.