Performance Feedback Spectrum

Overview

A framework for understanding the different levels of performance information that AI systems can communicate to users, and the distinct effects each level produces.

The Spectrum

Level	What’s Communicated	Effect on Performance	Effect on Trust Calibration
None	Just the prediction	Baseline	Poor (users can’t assess reliability
Overall Accuracy	”This model is 80% accurate”	Modest improvement	Minimal, too abstract for case-specific judgments
Confidence Score	”87% confident in this prediction”	Significant improvement	Moderate, helps distinguish high/low certainty cases
Contextual Awareness	”For similar cases, accuracy is 92%“	Significant improvement	Best, provides case-relevant calibration signal

Why the Levels Differ

Overall accuracy is too coarse. Knowing a model is “80% accurate” doesn’t help you decide whether this specific case falls in the 80% or the 20%.

Confidence scores are mathematically derived but not grounded in interpretable reasoning. They help users differentiate but don’t explain why a case might be uncertain.

Contextual awareness provides what confidence scores lack: a reference class. “The model struggles with cases like yours” is qualitatively different information than “the model is 73% confident.”

When to Use Which Level

Scenario	Recommended Level	Rationale
Low-stakes, high-volume decisions	Confidence Score	Fast, good enough
High-stakes, human-override expected	Contextual Awareness	Calibration matters more than speed
Regulatory/audit requirements	Contextual Awareness + Overall	Explainability requirements
User population struggles with probabilities	Contextual Awareness	Reference classes more intuitive than percentages

Limitations

The study tested these levels in an income prediction task; generalization to other domains is assumed but not proven
Contextual awareness requires additional infrastructure (flaw detection model)
The differences between confidence and awareness, while statistically significant, were modest in magnitude

Design Implications

If your AI system only reports confidence scores, you’re leaving calibration on the table. The marginal cost of adding contextual awareness may be worth the marginal improvement in human-AI teaming outcomes, especially in domains where overconfident-wrong predictions are costly.

>heyMHK

Performance Feedback Spectrum

Performance Feedback Spectrum

Overview

The Spectrum

Why the Levels Differ

When to Use Which Level

Limitations

Design Implications

Properties

Graph view

Table of Contents

Backlinks