Nine-Stage ML Workflow

The canonical machine learning development workflow consists of nine stages:

Model Requirements: Deciding which features are feasible with ML and selecting appropriate model types
Data Collection: Finding, integrating, or creating datasets (including transfer learning from generic datasets)
Data Cleaning: Removing inaccurate or noisy records
Data Labeling: Assigning ground truth labels (via engineers, domain experts, or crowd workers)
Feature Engineering: Extracting and selecting informative features
Model Training: Training and tuning models on prepared data
Model Evaluation: Testing against held-out datasets using predefined metrics
Model Deployment: Deploying inference code to target devices
Model Monitoring: Watching for errors during real-world execution

The workflow is highly non-linear. Evaluation and monitoring can trigger loops back to any previous stage, discovering distribution shift between training and production data might require returning to data collection, while new algorithms might prompt revisiting model requirements.

This iterative, experimental nature distinguishes ML development from traditional software workflows even though both claim to be “Agile.”

>heyMHK

Nine-Stage ML Workflow

Nine-Stage ML Workflow

Properties

Graph view

Backlinks