Evaluation Methods Tradeoff: Quantitative vs. Qualitative vs. Mixed
Overview
How you measure Human-AI Collaboration shapes what you learn about it. Each approach captures different aspects while missing others.
Quantitative Approaches
What they capture well:
- Objective performance (accuracy, precision, recall, F1)
- Efficiency (response time, task completion time, resource utilization)
- Error rates and failure patterns
- Comparative benchmarks across systems
What they miss:
- User experience and satisfaction nuances
- Trust dynamics and how they evolve
- Contextual factors affecting adoption
- Why systems succeed or fail, not just whether they do
Common in: Healthcare diagnostics, fraud detection, manufacturing quality control, domains where measurable outcomes matter most.
Qualitative Approaches
What they capture well:
- User perceptions, concerns, and preferences
- Trust development and calibration
- Workflow integration challenges
- Ethical considerations and bias concerns
- The “why” behind adoption or rejection
What they miss:
- Generalizability (findings may be context-specific)
- Comparative assessment across systems
- Objective performance verification
- Scalable measurement
Common in: Creative tools, mental health applications, early-stage system assessment, domains where human experience is central.
Mixed Methods
The promise: Combining quantitative rigor with qualitative depth to capture both “what” and “why.”
The practice: Often executed as parallel studies rather than true integration. Quantitative data shows performance; qualitative interviews explain perceptions. But the synthesis is where the insight lives, and synthesis is hard.
When essential: High-stakes domains (healthcare, finance) where both measurable outcomes and user trust matter. Creative domains where quality is subjective but still assessable.
The Pattern I Keep Seeing
Studies default to whatever method their discipline prefers: engineers measure accuracy; HCI researchers interview users; few do both well. The result is fragmented understanding, we know how systems perform OR how users feel, rarely both.
The call for “mixed methods” acknowledges this gap but doesn’t solve it. True mixed-methods research requires designing studies where quantitative and qualitative findings inform each other, not just coexist.
Practical Implication
If you’re evaluating a HAIC system, start by asking: “What would we need to know to trust this system in production?” The answer usually requires both objective performance data and understanding of how users will actually interact with it.
Related: 05-molecule—haic-evaluation-modes-framework, 05-atom—hmi-to-haic-shift