Mechanistic Explanation

Mechanistic explanations account for how phenomena arise from the organized causal interactions among parts. A mechanism is a set of entities and activities, organized to produce or maintain a phenomenon.

What distinguishes mechanistic explanations from merely descriptive models: they don’t just describe regularities but show how regularities emerge from causal structure. They provide “transition theories” — accounts of how one state leads to another through the operation of parts.

A key virtue: mechanistic explanations support intervention. By revealing the components and activities responsible for a phenomenon, they clarify how it might be changed or controlled.

This philosophical definition (from Machamer, Darden, Craver, and others) gives mechanistic interpretability its technical meaning. MI aspires to explain neural networks in terms of their underlying mechanisms in this specific sense.

Related: [None yet]