Three Fundamental ML Engineering Differences
Machine learning introduces three fundamental differences from prior software engineering domains:
1. Data Primacy ML is all about data. Discovering, sourcing, managing, and versioning data is inherently more complex and different than managing code. Software engineers design for elegance, abstraction, and modularity, but ML data is voluminous, context-specific, heterogeneous, and hard to describe. Data schema can change multiple times per day during rapid iteration.
2. Customization Skills Gap Reusing and customizing ML models requires fundamentally different skills than software reuse. With code, you can fork a library and modify it using the same skills you use to write your own software. With models, you often can’t just change parameters, you may need to retrain or replace the model entirely, requiring ML expertise and additional training data.
3. Module Boundary Erosion Traditional software engineering relies on strict module boundaries. ML components resist this pattern. Models can be “entangled” in complex ways, affecting each other during training and tuning even when teams intend isolation. One model’s effectiveness changes based on another model, regardless of code separation.
These differences don’t make ML engineering impossible, they make it a genuinely different discipline requiring adapted practices.
Related: 05-atom—model-entanglement, 05-atom—non-monotonic-error-propagation, 04-atom—data-versioning-complexity