How to Define the Quality of Data? A Feature-Based Literature Survey

Citation

Matoni, M., Kesper, A., & Taentzer, G. (2025). How to Define the Quality of Data? A Feature-Based Literature Survey. arXiv preprint arXiv:2504.01491.

Abstract

A systematic literature review shows that data quality is a multifaceted concept characterized by quality dimensions, but definitions vary widely. The authors use Feature-Oriented Domain Analysis (FODA) to specify a taxonomy of data quality definitions and classify existing approaches, identifying research gaps.

Core Contribution

  • Systematic literature review identifying 35 publications with original dimension-based DQ definitions (from 17,000+ initial results)
  • FODA-based taxonomy for classifying DQ definitions along four dimensions: data type, contextual relationships, definition type, and provenance
  • Research gap analysis revealing lack of consensus and areas needing development

Key Findings

  1. No consensus exists on data quality definition despite decades of research
  2. Most-cited dimensions: accuracy, completeness, consistency, timeliness, accessibility
  3. Contextual relationships matter: DQ relates to data itself (intrinsic), users, systems, and society
  4. Definition types vary: requirements-based vs. attribute-based approaches
  5. Provenance differs: intuitive, theoretical, or empirical foundations

Methodological Note

Uses FODA (Feature-Oriented Domain Analysis), originally designed for identifying commonalities in software product lines, as a taxonomy development method. This is a transferable approach for any domain with competing conceptual definitions.

Extracted Content

  • Wang & Strong (1996) - foundational DQ dimensions
  • ISO 25012 - international standard for data quality
  • Zaveri et al. (2015) - linked data quality

Extracted for heyMHK digital garden, 2026-01-03