Stress-Testing HCI Research Against Technical Constraints
Context
Human-AI interaction research often assumes idealized technical conditions: well-calibrated confidence, accurate explanations, reliable outputs. Real deployed systems frequently violate these assumptions.
Problem
Research findings about “how users respond to AI” may not generalize to production systems where AI behaves differently than the idealized experimental versions. Recommendations built on idealized assumptions may backfire in deployment.
Solution
Explicitly identify and test against realistic technical limitations:
-
Catalog the assumptions: What properties must the AI have for your design recommendations to work? (calibrated confidence, accurate explanations, consistent behavior, etc.)
-
Research the prevalence: How often do real systems violate these assumptions? The Li et al. study notes that many ML algorithms produce miscalibrated confidence, this isn’t an edge case.
-
Test degraded conditions: Run studies where the AI violates the ideal assumptions. What happens to user behavior when confidence is systematically wrong? When explanations are inaccurate?
-
Design for robustness: Prefer interventions that work even when technical ideals aren’t met, or that fail gracefully rather than creating opposite problems.
Consequences
Benefits:
- Research findings more likely to transfer to deployment
- Earlier identification of failure modes
- More honest assessment of intervention effectiveness
- Bridges gap between HCI and ML research communities
Costs:
- Increased study complexity
- May produce messier, less clean findings
- Requires understanding of technical systems, not just human factors
Examples
| Idealized Assumption | Realistic Constraint | Research Implication |
|---|---|---|
| Calibrated confidence | Systematic over/underconfidence | Test trust/reliance under miscalibration |
| Accurate explanations | Plausible but incorrect rationales | Test explanation comprehension with wrong explanations |
| Consistent outputs | Same input → different outputs | Test user adaptation to inconsistency |
Related: [None yet]