Goodness-of-Fit vs. Goodness-of-Data

Standard AI metrics (F1, accuracy, AUC) measure how well a model fits the data. They tell us nothing about how well the data represents the actual phenomenon.

A model can achieve perfect fit on a dataset that poorly captures the real-world situation it’s meant to address. High accuracy on the test set doesn’t mean the data itself has phenomenological fidelity, whether it accurately represents the underlying reality, or validity, whether it explains what we think it explains.

This distinction matters because practitioners use goodness-of-fit metrics as proxies for data quality. When the model scores well, they assume the data is fine. But model performance is the unit of measurement for the entire system, not for the dataset itself.

Related: 05-atom—uniform-confidence-problem, 04-atom—data-cascades-definition