Expert Parity Trajectory

As of October 2025, the best frontier model (Claude Opus 4.1) achieves a 47.6% win rate against human industry experts on knowledge work tasks requiring an average of 7 hours to complete.

Key data points:

  • Claude Opus 4.1: 47.6% win+tie rate (34.1% wins only)
  • GPT-5 high: 38.8% win+tie rate (27.9% wins only)
  • o3 high: 34.1% win+tie rate
  • Performance of OpenAI frontier models increased roughly linearly over time

The benchmark tasks were completed by experts with average 14 years of professional experience. Occupations covered span finance, healthcare, manufacturing, government, and professional services.

Notably, win rates are highest for shorter tasks (0-2 hours: 45-56% win rate) and decline as task duration increases (8+ hours: 25-37% win rate).

Related: 05-atom—capability-as-leading-indicator