The Vocabulary Problem in Data Quality

The data quality field doesn’t suffer from a lack of definitions, it suffers from a lack of shared vocabulary for comparing them.

Out of 17,000+ publications found in a systematic literature search, only 35 contained original dimension-based quality definitions. The rest referenced existing work or didn’t make definitions explicit. Yet even among these 35, terminology varies so widely that direct comparison is difficult.

The same concept appears under different names across frameworks. Different concepts share the same name. Classification schemes vary. The result: researchers and practitioners reinvent wheels, unable to see when their work aligns with or extends prior definitions.

The pattern suggests that taxonomy development, creating shared vocabulary for comparison, may be more valuable than adding new definitions. Before asking “what is data quality?”, the field needs to answer “how do existing answers differ?”

Related: 04-atom—data-quality-consensus-gap, 03-molecule—foda-taxonomy-methodology