Decomposition Serves Purpose, Not Truth

The Principle

Decompositions of neural networks should be evaluated by how effectively they support causal understanding, prediction, and intervention for specific goals — not by how well they mirror some assumed “real” structure.

Why This Matters

A persistent temptation in mechanistic interpretability is to search for “the” right decomposition — the natural joints of the system. But complex systems span multiple levels of organization, and no single level is uniquely privileged. Different research goals warrant different decompositions.

This isn’t a counsel of despair but a reorientation: stop asking “have we found the true structure?” and start asking “does this decomposition serve our purpose?”

How to Apply

Match decomposition to goal:

  • Controlling specific outputs → decomposition that isolates relevant circuits
  • Understanding generalization → decomposition that captures abstraction patterns
  • Detecting deception → decomposition that reveals goal-oriented processing

Validate through intervention:

  • A decomposition is good if manipulating its components changes behavior in predictable ways
  • Components acquire explanatory force when their functional significance is demonstrated behaviorally

Embrace multiple levels:

  • Layer-level, circuit-level, neuron-level, direction-level decompositions can all be valid
  • Treating one as “more real” is a category error

When This Especially Matters

When researchers debate which approach to interpretability is “correct” — debates that are often really about which approach serves different unstated purposes. Making goals explicit can dissolve apparent disagreements.

Related: [None yet]