Configuration Debt
In mature ML systems, the number of lines of configuration can far exceed the number of lines of traditional code. Each configuration line is a potential failure point.
Configuration encompasses: which features are used, data selection logic, algorithm-specific learning settings, pre/post-processing steps, verification methods, and countless interdependencies between them.
Example complexities: Feature A was incorrectly logged from 9/14 to 9/17. Feature B isn’t available on data before 10/7. Feature D isn’t available in production, so substitutes D′ and D″ must be used. Feature Z requires extra memory due to lookup tables. Feature Q precludes Feature R due to latency constraints.
Both researchers and engineers tend to treat configuration as an afterthought. Verification and testing of configurations may not even be seen as important. Yet configuration mistakes lead to serious loss of time, wasted compute, and production incidents.
Configuration should be version-controlled, code-reviewed, and designed so that differences between versions are visually obvious.
Related: 05-molecule—ml-technical-debt-taxonomy, 05-atom—deploy-maintain-dichotomy