How Does ML Technical Debt Manifest in LLM-Based Systems?
The Sculley taxonomy was developed from traditional ML systems circa 2015 (feature engineering, supervised learning, batch retraining. What new debt categories emerge when the core model is a foundation model accessed via API or fine-tuning?
Some patterns likely amplify: glue code around prompt engineering, hidden dependencies on model versions that change without notice, feedback loops through RLHF and user interaction data.
Some patterns may transform: if the “model” is an API call, where does the boundary between your system and the model provider’s system lie? Correction cascades might now span organizational boundaries.
Some patterns may be genuinely new: prompt injection as a debt category, multi-model orchestration complexity, the challenge of versioning when the model knows things you didn’t train it on.
The original paper’s questions for assessing debt remain useful: How easily can a new approach be tested? What’s the transitive closure of dependencies? How precisely can impact be measured? The answers in LLM systems may be worse than in traditional ML.
Related: 05-molecule—ml-technical-debt-taxonomy, 05-atom—deploy-maintain-dichotomy