When Does an AI State Count as a Belief?

What criteria must a representation satisfy to count as a belief (or belief-like state) in an AI system?

Philosophers Herrmann and Levinstein propose four criteria grounded in philosophical literature on belief, suggesting satisfaction comes in degrees — “the more a representation satisfies these requirements, the more helpful it is to think of the representation as belief-like.”

Key considerations:

  • The state must be used by the system — it must causally drive behavior appropriate to the content
  • Finding a direction in activation space that correlates with truth values is necessary but not sufficient
  • Causal mediation matters: does modulating the state actually change the model’s behavior in ways consistent with changed beliefs?

This opens a research program: rather than asking “does the model have beliefs?” (binary), ask “to what degree does this internal state exhibit belief-like properties?” (graded).

The practical stakes are high: if we can identify genuine belief-like states, we have a shot at detecting deception from internals. If we can’t, “lie detection” research may be chasing a mirage.

Related: 05-atom—lying-requires-beliefs, 05-atom—deception-requires-intention