Self-Assessing AI Pattern

Context

You have a primary model that makes predictions, but users struggle to know when to trust its outputs. Confidence scores help somewhat but don’t explain why the model might fail.

Problem

How do you create AI awareness of its own limitations, not just mathematical uncertainty, but structured understanding of error patterns?

Solution

Train an interpretable model (typically a decision tree) on the primary model’s mistakes. Use the original dataset but replace labels with binary indicators: did the primary model predict this case correctly or incorrectly?

The resulting “flaw tree” creates clusters of cases with similar error profiles. For any new prediction, you can report: “The AI’s accuracy for similar individuals is X%.” This contextualizes uncertainty in terms of what kind of case this is, not just how certain the math is.

Implementation Steps

  1. Train your primary model, generate predictions on training data
  2. Create a derived dataset where labels = (primary model correct? yes/no)
  3. Balance the dataset if needed (primary models are usually more correct than incorrect)
  4. Train a decision tree on the derived dataset
  5. Use leaf nodes to group cases with similar error profiles
  6. For new predictions, traverse the tree and report the error rate for that leaf’s cases

Consequences

Benefits:

  • Provides interpretable explanations for uncertainty (tree paths are human-readable)
  • Groups similar cases, enabling “for people like this” framing
  • Separates the uncertainty signal from the primary model’s confidence calculation
  • Can identify systematic error patterns (e.g., “the model struggles when age > 50 and education < bachelor’s”)

Limitations:

  • The flaw tree’s accuracy is bounded by how predictable the primary model’s errors are
  • Adds complexity to the system architecture
  • Requires sufficient training data with balanced error cases

Trade-offs:

  • Tree depth vs. interpretability (deeper trees = finer clusters but harder to explain)
  • Flaw tree accuracy in this study was 66%, better than random, but far from perfect

When This Applies

  • Decision-support systems where human override is expected and desirable
  • High-stakes domains where calibrated trust matters more than raw accuracy
  • Systems where users need to understand why to be skeptical, not just how much

Related: 05-atom—confidence-is-not-awareness, 01-atom—calibrated-trust-vs-high-trust, 07-molecule—ui-as-ultimate-guardrail