Software Engineering for Machine Learning: A Case Study

Citation

Amershi, S., Begel, A., Bird, C., DeLine, R., Gall, H., Kamar, E., Nagappan, N., Nushi, B., & Zimmermann, T. (2019). Software Engineering for Machine Learning: A Case Study. Proceedings of the 41st International Conference on Software Engineering (ICSE).

Core Framing

The authors position ML as the latest in a series of disruptive shifts in software engineering (personal computing → internet → web → mobile → cloud → AI), arguing that each shift forces organizations to evolve their development practices. The study examines how Microsoft teams have adapted Agile processes to integrate ML workflows.

Key Contributions

Nine-stage ML workflow description
Best practices for ML-centric software development
ML process maturity model (Activity Maturity Index)
Three fundamental differences between ML and traditional software engineering

Three Fundamental Differences

Data primacy: Discovering, managing, and versioning data is far more complex than managing code
Customization/reuse skills gap: Model customization requires ML expertise beyond typical software skills
Entanglement: ML components are harder to isolate than traditional modules; models affect each other in non-obvious ways

Methodology

14 semi-structured interviews with Microsoft engineers (snowball sampling)
Survey of 4,195 Microsoft employees on AI/ML mailing lists (551 responses, 13.6% response rate)
Card sorting analysis of open-response items

Nine-Stage ML Workflow

Model Requirements → Data Collection → Data Cleaning → Data Labeling → Feature Engineering → Model Training → Model Evaluation → Model Deployment → Model Monitoring

Key observation: This workflow is highly non-linear with multiple feedback loops. Model evaluation and monitoring can loop back to any previous stage.

>heyMHK

Software Engineering for Machine Learning: A Case Study

Software Engineering for Machine Learning: A Case Study

Citation

Core Framing

Key Contributions

Three Fundamental Differences

Methodology

Nine-Stage ML Workflow

Extracted Content

Properties

Graph view

Table of Contents