Explicit Evidence vs. Latent Concept Annotation

LLMs perform reliably on annotation tasks where the answer is grounded in explicit textual evidence. They struggle with tasks requiring inference, context, or judgment about latent concepts.

High-reliability tasks (explicit evidence):

  • Extracting outcomes stated directly in text (“appeal allowed/denied”)
  • Identifying named entities
  • Detecting clearly expressed sentiment

Low-reliability tasks (latent concepts):

  • Inferring speaker attitudes from indirect cues
  • Classifying complex rhetorical strategies (irony, nostalgia)
  • Judgments requiring cultural or historical context

The distinction isn’t about task complexity, it’s about whether textual signals directly map to annotation categories. A complex task with clear markers works; a simple task requiring world knowledge doesn’t.

This maps to the broader pattern: LLMs excel at pattern matching in text, not reasoning about what isn’t written.

Related: 05-atom—llm-consensus-quality-proxy, 06-atom—tacit-knowledge