Evaluating Human-AI Collaboration: A Review and Methodological Framework
Citation
Fragiadakis, G., Diou, C., Kousiouris, G., & Nikolaidou, M. (2024). Evaluating Human-AI Collaboration: A Review and Methodological Framework. arXiv preprint arXiv:2407.19098.
Core Argument
Traditional HMI evaluation methods (usability, task performance) are insufficient for Human-AI Collaboration because HAIC involves trust, adaptability, and reciprocal dynamics that efficiency metrics alone cannot capture. The paper proposes a structured framework with three collaboration modes (AI-Centric, Human-Centric, Symbiotic) and associated evaluation metrics.
Key Contributions
- Literature synthesis of HAIC evaluation approaches across domains (healthcare, finance, manufacturing, education, creative arts)
- Three-mode taxonomy for classifying HAIC systems based on task allocation
- Metric framework mapping evaluation factors (Goals, Interaction, Task Allocation) to specific measurable outcomes
- Decision tree for selecting relevant metrics based on collaboration mode
Framing Insight
The paper’s framing reveals a methodological gap more than it solves it: current evaluation is fragmented, with quantitative approaches missing subjective experience and qualitative approaches lacking generalizability. The call for “mixed methods” is their compromise position.
Extracted Content
- 05-atom—haic-three-modes
- 05-atom—trust-calibration-problem
- 05-atom—hmi-to-haic-shift
- 05-atom—learning-to-defer-paradigm
- 05-molecule—haic-evaluation-modes-framework
- 07-molecule—evaluation-methods-tradeoff
Limitations Noted by Authors
- Framework is theoretical, not yet empirically validated
- Behavioral and ethical dimensions excluded from scope
- Weighting mechanism needs empirical refinement
Related Work to Follow
- Fischer (2023) on emotional/social dimensions of HAIC
- Woelfle et al. (2024) on benchmarking human-AI collaboration
- Holstein & Aleven (2022) on human-AI complementarity in education