Decomposition Serves Purpose, Not Truth
The Principle
Decompositions of neural networks should be evaluated by how effectively they support causal understanding, prediction, and intervention for specific goals — not by how well they mirror some assumed “real” structure.
Why This Matters
A persistent temptation in mechanistic interpretability is to search for “the” right decomposition — the natural joints of the system. But complex systems span multiple levels of organization, and no single level is uniquely privileged. Different research goals warrant different decompositions.
This isn’t a counsel of despair but a reorientation: stop asking “have we found the true structure?” and start asking “does this decomposition serve our purpose?”
How to Apply
Match decomposition to goal:
- Controlling specific outputs → decomposition that isolates relevant circuits
- Understanding generalization → decomposition that captures abstraction patterns
- Detecting deception → decomposition that reveals goal-oriented processing
Validate through intervention:
- A decomposition is good if manipulating its components changes behavior in predictable ways
- Components acquire explanatory force when their functional significance is demonstrated behaviorally
Embrace multiple levels:
- Layer-level, circuit-level, neuron-level, direction-level decompositions can all be valid
- Treating one as “more real” is a category error
When This Especially Matters
When researchers debate which approach to interpretability is “correct” — debates that are often really about which approach serves different unstated purposes. Making goals explicit can dissolve apparent disagreements.
Related: [None yet]