AI Risk Measurement Challenges

Seven categories of challenges complicate AI risk measurement:

Third-party complexity: Risk metrics from developers may not align with deployers. Customers may integrate third-party components without sufficient governance.

Emergent risk tracking: New risks emerge as systems interact with real-world contexts. Impact assessment approaches remain immature.

Metric limitations: No consensus on robust, verifiable measurement methods. Metrics can be oversimplified, gamed, or fail to account for differences across affected groups.

Lifecycle variation: Risk measured early in development may differ from risk at deployment. Latent risks can increase as systems adapt and evolve.

Real-world divergence: Laboratory measurements often differ from operational settings. Controlled testing misses interaction effects.

Inscrutability: Opaque systems complicate measurement. Limited explainability, poor documentation, and inherent uncertainties all contribute.

Human baseline absence: For AI augmenting human decision-making, appropriate baselines for comparison are difficult to establish. Humans and AI perform tasks differently.

The inability to measure a risk does not imply the system is necessarily high or low risk.

Related: 05-atom—ai-risk-definition, 05-atom—trustworthy-ai-characteristics, 05-atom—tevv-throughout-lifecycle