Hidden Technical Debt in Machine Learning Systems

Sculley et al., Google (2015)

Core Argument

Machine learning systems incur massive ongoing maintenance costs that are easy to underestimate. Using Ward Cunningham’s “technical debt” metaphor from software engineering, the authors argue that ML systems have all the maintenance problems of traditional code plus an additional set of ML-specific issues.

The central dichotomy: developing and deploying ML systems is relatively fast and cheap, but maintaining them over time is difficult and expensive.

Key Concepts Introduced

CACE Principle: “Changing Anything Changes Everything”: ML systems entangle signals, making isolated improvements impossible.

Correction Cascades: Stacking models to fix other models creates improvement deadlocks.

Undeclared Consumers: Systems silently using model outputs create hidden coupling.

Hidden Feedback Loops: Two systems influence each other indirectly through the world.

The ML Code Fraction: Only a tiny fraction of production ML systems is actual ML code; surrounding infrastructure dominates.

Debt Categories

  1. Boundary erosion (entanglement, correction cascades, undeclared consumers)
  2. Data dependencies (unstable, underutilized, untracked)
  3. Feedback loops (direct and hidden)
  4. System anti-patterns (glue code, pipeline jungles, dead codepaths)
  5. Configuration debt
  6. External world changes
  7. Other debt (data testing, reproducibility, process management, cultural)

Enduring Relevance

Written before the transformer era, the patterns identified here are amplified, not resolved, by foundation models. LLM-based systems inherit these debt patterns while adding new ones around prompt engineering, fine-tuning fragility, and API dependencies.

Extracted Content

05-atom—cace-principle05-atom—ml-code-fraction05-atom—correction-cascades04-atom—unstable-data-dependencies05-atom—hidden-feedback-loops05-molecule—ml-technical-debt-taxonomy05-molecule—ml-system-anti-patterns