Boundary Erosion in ML Systems
Definition
Traditional software engineering relies on strong abstraction boundaries, encapsulation and modular design that allow isolated changes. ML systems systematically erode these boundaries because they’re required precisely when desired behavior can’t be expressed in software logic without external data dependency.
Why It Matters
The real world doesn’t fit into tidy encapsulation. When you bring the world’s complexity into a system through data, you import its messiness too. Boundaries that work for deterministic code fail for probabilistic models.
This isn’t a solvable problem, it’s a fundamental characteristic of ML systems. The question is how to manage it, not how to eliminate it.
How It Works
Entanglement (CACE) Changing any input changes the effective use of all other inputs. The model learns to use signals in combination; you can’t change one in isolation.
Correction Cascades Building quick fixes on top of existing models creates hidden dependencies. Each layer makes the layers below harder to improve.
Undeclared Consumers Without access controls, other systems silently depend on model outputs. The model maintainers don’t know who will break when they change things.
Implications
Design for change from the start. Assume boundaries will be violated and plan monitoring accordingly. Invest in tooling that makes dependencies visible even when they can’t be prevented.
Accept that some “improvements” will be blocked by entanglement. Factor this into timelines and expectations.
When This Especially Matters
Any system where models will be updated over time. Any system where multiple teams contribute to the overall pipeline. Any system that will be maintained by different people than those who built it.
Related: 05-molecule—ml-technical-debt-taxonomy, 05-atom—hidden-feedback-loops