Underutilized Data Dependencies
Input signals that provide little incremental modeling benefit but remain in the system, making it unnecessarily vulnerable to change.
These dependencies creep in through several paths:
Legacy features: Included early in development, made redundant by later additions, but never removed.
Bundled features: Evaluated as a group, found beneficial, and added together under deadline pressure, including features that individually add little value.
ε-features: Small accuracy gains that seemed worth the complexity at the time.
Correlated features: Two features are strongly correlated, but one is more directly causal. The model can’t distinguish them, so it credits both. If correlations shift, the system breaks.
Detection requires exhaustive leave-one-feature-out evaluations. These should run regularly to identify and remove unnecessary dependencies before they become liabilities.
Related: 04-atom—unstable-data-dependencies, 05-molecule—ml-technical-debt-taxonomy