The Calibration Detection Gap
Users cannot detect AI confidence miscalibration through normal interaction.
In controlled experiments, the majority of users rated both overconfident and underconfident AI systems as “well-calibrated.” Despite experiencing dozens of interactions where stated confidence systematically diverged from actual accuracy, participants failed to notice the pattern.
This happens because detecting miscalibration requires simultaneously tracking AI accuracy and confidence scores, then assessing their correspondence, a cognitively demanding task that general users don’t naturally perform.
The implication: you cannot rely on users to “figure out” when AI confidence is unreliable. If the system doesn’t surface miscalibration explicitly, users will treat stated confidence as accurate.
Related: 05-atom—uniform-confidence-problem, 01-atom—transparency-reliance-paradox