Visual Self-Checking Improves Performance

Prompting AI models to visually inspect their own deliverables before submission dramatically improves output quality.

In GDPval experiments, adding prompts that encouraged GPT-5 to render files as images and check layouts:

Fully eliminated black-square artifacts from PDF responses (previously affected 50%+ of PDFs)
Reduced egregious formatting errors in PowerPoint files from 86% to 64%
Increased agents using multi-modal capabilities to inspect deliverables from 15% to 97%
Improved human preference win rates by 5 percentage points

The prompt additions were straightforward: convert deliverables to PNGs, display them, check for cut-off text or overlapping graphics, run programmatic formatting checks.

This represents low-hanging fruit for agent performance improvement. Models have the capability to self-verify visual outputs - they just don’t do it by default. Explicit scaffolding to leverage existing multi-modal capabilities yields meaningful gains without any model changes.

>heyMHK

Visual Self-Checking Improves Performance

Visual Self-Checking Improves Performance

Properties

Graph view