Visual Self-Checking Improves Performance

Prompting AI models to visually inspect their own deliverables before submission dramatically improves output quality.

In GDPval experiments, adding prompts that encouraged GPT-5 to render files as images and check layouts:

  • Fully eliminated black-square artifacts from PDF responses (previously affected 50%+ of PDFs)
  • Reduced egregious formatting errors in PowerPoint files from 86% to 64%
  • Increased agents using multi-modal capabilities to inspect deliverables from 15% to 97%
  • Improved human preference win rates by 5 percentage points

The prompt additions were straightforward: convert deliverables to PNGs, display them, check for cut-off text or overlapping graphics, run programmatic formatting checks.

This represents low-hanging fruit for agent performance improvement. Models have the capability to self-verify visual outputs - they just don’t do it by default. Explicit scaffolding to leverage existing multi-modal capabilities yields meaningful gains without any model changes.

Related: 05-atom—model-failure-mode-distribution