Deception Requires Intention

The standard philosophical definition of deception: the act of intentionally causing another agent to form a false belief.

Two key elements:

  1. The inducement of false belief in another
  2. The presence of an intention or goal on the part of the deceiver

Deception doesn’t always involve lying. One can deceive through actions (feints in sports), through omission, through misdirection — all without uttering a falsehood.

The challenge for AI systems: This definition requires significant cognitive complexity — specifically, intentions. It’s highly controversial whether even frontier models possess intentions in the relevant sense.

One move is to weaken the definition: if a model persistently outputs misleading answers where users predictably misinterpret them, call that deception regardless of internal “motivations.” But this abandons the hope that MI can detect deception based on internal states — a richer notion where what’s inside the system matters, not just its effects.

Related: [None yet]