The Context Specification Gap
AI model performance degrades significantly when tasks are under-specified, even when the core work is identical.
In experiments with deliberately lower-context prompts (42% of original token count), GPT-5 performance dropped measurably. The shortened prompts omitted context about where to find data within reference files, how to approach the problem, or detailed formatting expectations.
This gap reveals something important about the nature of professional work: much of expertise involves figuring out what to work on and where to get the necessary inputs - not just executing once those are clear.
Current AI evaluations typically provide full context in the prompt. Real work rarely does. The performance difference between well-specified and under-specified tasks may be a better proxy for deployment readiness than headline benchmark scores.