Unstable Data Dependencies

Some input signals qualitatively or quantitatively change behavior over time, creating maintenance risk for any system that consumes them.

Instability can be implicit, when the input comes from another ML model that updates, or from a data-dependent lookup table. It can also be explicit, when the engineering ownership of the signal is separate from the consuming model’s ownership, allowing updates at any time.

The danger: even “improvements” to an input signal may have arbitrary detrimental effects on consuming systems. If a signal was previously mis-calibrated, the consuming model likely learned to compensate. Silently correcting the calibration will break that compensation.

One mitigation is versioned copies of input signals, freeze a mapping and use it until an updated version has been fully vetted. But versioning carries its own costs: staleness and the overhead of maintaining multiple versions.

Related: 04-atom—underutilized-data-dependencies, 05-molecule—ml-technical-debt-taxonomy