LLM Self-Critique Tracks External Quality
In knowledge reconstruction tasks, an LLM’s self-critical scores correlated strongly (0.73 Spearman) with its scores when given access to ground truth for comparison.
The agent’s internal assessment of “how complete is my understanding?” meaningfully tracked actual completeness, even though the agent had no access to the original source during self-evaluation.
This suggests self-critique mechanisms can serve as useful (though imperfect) proxies for knowledge quality when external validation isn’t available. The agent knew what it didn’t know, at least partially.
Related: 05-atom—uniform-confidence-problem