“In real life, we never see clean data”

“In real life, we never see clean data. Courses and trainings focus on models and tools to use but rarely teach about data cleaning and pipeline gaps.”

  • AI practitioner working on healthcare in West Africa

This quote captures the gap between AI education and AI practice. The curricula optimize for model development on curated benchmarks. The job requires wrestling with data that is incomplete, inconsistent, context-dependent, and constantly shifting.

The practitioner perspective matters: this isn’t a complaint about difficult work. It’s an observation that the field’s training doesn’t prepare people for the work that actually determines whether systems succeed or fail.

Related: 05-atom—toy-dataset-training-gap, 04-atom—data-cascades-definition, 05-atom—model-valorization