Hidden Technical Debt in Machine Learning Systems
Citation
Sculley, D., et al. (2015). Hidden Technical Debt in Machine Learning Systems. Advances in Neural Information Processing Systems 28 (NIPS 2015).
Core Contribution
Applies the software engineering concept of technical debt to ML systems. Only a small fraction of real-world ML systems is composed of actual ML code, the surrounding infrastructure is vast and complex.
Key Concepts
Boundary Erosion: ML systems have unclear boundaries between components due to entanglement.
CACE Principle: Changing Anything Changes Everything. In ML, no inputs are truly independent.
Hidden Feedback Loops: Systems influence each other’s training data over time without explicit connection.
Undeclared Consumers: Other systems may silently depend on your model’s outputs.
Data Dependencies: More insidious than code dependencies because they’re less visible.
Key Visual
The famous diagram showing ML code as a tiny black box surrounded by vast infrastructure: configuration, data collection, feature extraction, analysis tools, process management, serving infrastructure, and monitoring.
Related: 00-source—sambasivan-2021-data-cascades, 05-molecule—ml-technical-debt-taxonomy