Evaluating Human-AI Collaboration: A Review and Methodological Framework

Citation

Fragiadakis, G., Diou, C., Kousiouris, G., & Nikolaidou, M. (2024). Evaluating Human-AI Collaboration: A Review and Methodological Framework. arXiv preprint arXiv:2407.19098.

Core Argument

Traditional HMI evaluation methods (usability, task performance) are insufficient for Human-AI Collaboration because HAIC involves trust, adaptability, and reciprocal dynamics that efficiency metrics alone cannot capture. The paper proposes a structured framework with three collaboration modes (AI-Centric, Human-Centric, Symbiotic) and associated evaluation metrics.

Key Contributions

  1. Literature synthesis of HAIC evaluation approaches across domains (healthcare, finance, manufacturing, education, creative arts)
  2. Three-mode taxonomy for classifying HAIC systems based on task allocation
  3. Metric framework mapping evaluation factors (Goals, Interaction, Task Allocation) to specific measurable outcomes
  4. Decision tree for selecting relevant metrics based on collaboration mode

Framing Insight

The paper’s framing reveals a methodological gap more than it solves it: current evaluation is fragmented, with quantitative approaches missing subjective experience and qualitative approaches lacking generalizability. The call for “mixed methods” is their compromise position.

Extracted Content

Limitations Noted by Authors

  • Framework is theoretical, not yet empirically validated
  • Behavioral and ethical dimensions excluded from scope
  • Weighting mechanism needs empirical refinement
  • Fischer (2023) on emotional/social dimensions of HAIC
  • Woelfle et al. (2024) on benchmarking human-AI collaboration
  • Holstein & Aleven (2022) on human-AI complementarity in education