Introduction
Neuron explanation tools have become essential for understanding the functioning of deep neural networks. However, despite achieving great empirical success, theoretical foundations remain absent, which is crucial for ensuring trustworthy and reliable explanations. This work presents a first theoretical analysis of two fundamental challenges: faithfulness and stability.
Faithfulness
Faithfulness refers to the ability of neuron identification to represent the underlying concept faithfully. This is essential for ensuring accurate and reliable explanations. In this work, we analyze the possibility of considering neuron identification as the inverse process of machine learning.
Stability
Stability refers to the consistency of neuron identifications across different datasets. This is crucial for ensuring replicable and reliable explanations. By proposing a bootstrap ensemble procedure and utilizing the BE (Bootstrap Explanation) method, we are able to quantify stability of identifications.
Generalization bounds for similarity metrics
We derive generalization bounds for widely used similarity metrics such as accuracy, AUROC, and IoU. This allows us to ensure faithfulness of identifications.
Proposed method
We propose a new method that combines theoretical analysis with practical implementation to derive neuron explanations that are trustworthy and stable.
Experiments
Our experiments on both synthetic and real data validate theoretical results and demonstrate the practicality of our method. This work represents an important step towards trustworthy neuron interpretation.
๐ฌ Commenti (0)
๐ Accedi o registrati per commentare gli articoli.
Nessun commento ancora. Sii il primo a commentare!