Li et al. 2024 - Understanding the Effects of Miscalibrated AI Confidence
Citation
Li, J., Yang, Y., Zhang, R., Liao, Q. V., Song, T., Xu, Z., & Lee, Y. (2024). Understanding the Effects of Miscalibrated AI Confidence on User Trust, Reliance, and Decision Efficacy. arXiv:2402.07632.
Core Question
When AI confidence scores don’t accurately reflect correctness likelihood, what happens to user trust, reliance, and decision quality, and can transparency about miscalibration help?
Framing Analysis
The authors position this as addressing a gap in HCI research: prior studies assume AI confidence is well-calibrated, but real-world systems are often miscalibrated. The framing itself reveals how human factors research often builds on idealized technical assumptions that don’t hold in deployment.
Key Findings
Experiment 1 (N=126)
- Users cannot detect miscalibration: Most participants rated both overconfident and underconfident AI as “well-calibrated”
- Overconfident AI → over-reliance: Users switched to AI advice more often, including incorrect advice
- Underconfident AI → under-reliance: Users ignored correct AI advice more often
- Both directions harm decision efficacy: Accuracy gains from AI collaboration decreased with miscalibrated systems
- Trust levels unchanged: Miscalibration didn’t affect stated trust, users couldn’t perceive the problem
Experiment 2 (N=126)
- Transparency helps detection: Telling users about calibration levels helped them recognize miscalibration
- Transparency reduces trust: Users trusted uncalibrated AI less when informed it was uncalibrated
- But creates under-reliance: Informed users under-relied on both overconfident AND underconfident AI
- No efficacy improvement: Knowing about miscalibration didn’t improve decision outcomes
Transferable Insights
- Transparency can trade one problem for another (over-reliance → under-reliance)
- User awareness doesn’t automatically enable appropriate action
- Stated confidence is not uncertainty visibility
- Miscalibration creates asymmetric failure modes
Methodological Notes
- Simulated AI with controlled accuracy (70%) and confidence levels (60%, 70%, 80%)
- City image recognition task (minimal domain expertise required)
- Between-subjects design across calibration conditions
- Measured trust (attitude), reliance (behavior), and decision efficacy (accuracy gain)
Connections
- 05-atom—uniform-confidence-problem
- 01-atom—transparency-reliance-paradox
- 05-atom—calibration-detection-gap
- 01-atom—trust-reliance-distinction
Extraction Status
- Source file created
- Atoms extracted
- Molecules created
- Organism drafted