Explicit Evidence vs. Latent Concept Annotation
LLMs perform reliably on annotation tasks where the answer is grounded in explicit textual evidence. They struggle with tasks requiring inference, context, or judgment about latent concepts.
High-reliability tasks (explicit evidence):
- Extracting outcomes stated directly in text (“appeal allowed/denied”)
- Identifying named entities
- Detecting clearly expressed sentiment
Low-reliability tasks (latent concepts):
- Inferring speaker attitudes from indirect cues
- Classifying complex rhetorical strategies (irony, nostalgia)
- Judgments requiring cultural or historical context
The distinction isn’t about task complexity, it’s about whether textual signals directly map to annotation categories. A complex task with clear markers works; a simple task requiring world knowledge doesn’t.
This maps to the broader pattern: LLMs excel at pattern matching in text, not reasoning about what isn’t written.
Related: 05-atom—llm-consensus-quality-proxy, 06-atom—tacit-knowledge