Annotation Task Suitability Framework
Definition
A framework for evaluating whether a given annotation task is appropriate for crowdsourcing, expert annotation, or AI-assisted labeling, based on task characteristics and quality requirements.
Key Dimensions
Task Complexity: Simple (binary) vs. complex (multi-step reasoning) Required Expertise: General knowledge vs. domain specialization Subjectivity: Objective ground truth vs. inherently subjective Disagreement Signal: Is annotator disagreement noise or information? Scale Requirements: Thousands of examples vs. hundreds
Suitability Matrix
| Task Type | Crowd | Expert | AI-Assisted |
|---|---|---|---|
| Simple objective | ✓✓ | ✓ | |
| Complex objective | ✓ | ✓ | |
| Subjective | ✓ (aggregate) | ✓ (calibrate) | ✗ |
| Specialized | ✗ | ✓✓ | ✓ (with expert review) |
Quality Considerations
- Inter-annotator agreement thresholds
- Gold standard validation sets
- Annotator calibration procedures
- Error analysis and adjudication
Common Mistakes
- Using crowds for expert tasks (quality problems)
- Using experts for simple tasks (cost problems)
- Assuming disagreement is always error
- Insufficient annotator training
Related: 03-research-methods, 05-atom—llm-annotation-reliability-gap