Hidden Technical Debt in Machine Learning Systems

Citation

Sculley, D., et al. (2015). Hidden Technical Debt in Machine Learning Systems. Advances in Neural Information Processing Systems 28 (NIPS 2015).

Core Contribution

Applies the software engineering concept of technical debt to ML systems. Only a small fraction of real-world ML systems is composed of actual ML code, the surrounding infrastructure is vast and complex.

Key Concepts

Boundary Erosion: ML systems have unclear boundaries between components due to entanglement.

CACE Principle: Changing Anything Changes Everything. In ML, no inputs are truly independent.

Hidden Feedback Loops: Systems influence each other’s training data over time without explicit connection.

Undeclared Consumers: Other systems may silently depend on your model’s outputs.

Data Dependencies: More insidious than code dependencies because they’re less visible.

Key Visual

The famous diagram showing ML code as a tiny black box surrounded by vast infrastructure: configuration, data collection, feature extraction, analysis tools, process management, serving infrastructure, and monitoring.

Related: 00-source—sambasivan-2021-data-cascades, 05-molecule—ml-technical-debt-taxonomy