AI Risk Measurement Challenges
Seven categories of challenges complicate AI risk measurement:
Third-party complexity: Risk metrics from developers may not align with deployers. Customers may integrate third-party components without sufficient governance.
Emergent risk tracking: New risks emerge as systems interact with real-world contexts. Impact assessment approaches remain immature.
Metric limitations: No consensus on robust, verifiable measurement methods. Metrics can be oversimplified, gamed, or fail to account for differences across affected groups.
Lifecycle variation: Risk measured early in development may differ from risk at deployment. Latent risks can increase as systems adapt and evolve.
Real-world divergence: Laboratory measurements often differ from operational settings. Controlled testing misses interaction effects.
Inscrutability: Opaque systems complicate measurement. Limited explainability, poor documentation, and inherent uncertainties all contribute.
Human baseline absence: For AI augmenting human decision-making, appropriate baselines for comparison are difficult to establish. Humans and AI perform tasks differently.
The inability to measure a risk does not imply the system is necessarily high or low risk.
Related: 05-atom—ai-risk-definition, 05-atom—trustworthy-ai-characteristics, 05-atom—tevv-throughout-lifecycle