Visual Self-Checking Improves Performance
Prompting AI models to visually inspect their own deliverables before submission dramatically improves output quality.
In GDPval experiments, adding prompts that encouraged GPT-5 to render files as images and check layouts:
- Fully eliminated black-square artifacts from PDF responses (previously affected 50%+ of PDFs)
- Reduced egregious formatting errors in PowerPoint files from 86% to 64%
- Increased agents using multi-modal capabilities to inspect deliverables from 15% to 97%
- Improved human preference win rates by 5 percentage points
The prompt additions were straightforward: convert deliverables to PNGs, display them, check for cut-off text or overlapping graphics, run programmatic formatting checks.
This represents low-hanging fruit for agent performance improvement. Models have the capability to self-verify visual outputs - they just don’t do it by default. Explicit scaffolding to leverage existing multi-modal capabilities yields meaningful gains without any model changes.