The Taxonomy-to-Detection Gap
Context
Classification systems and detection tools are often developed in parallel but operate at different speeds. Academic researchers identify and categorize phenomena faster than engineers can build reliable detection systems.
The Problem
A comprehensive taxonomy exists with 68 types. Available detection tools cover 31 types (45.5%). Available training datasets cover 30 types (44%). The knowledge of what exists far exceeds the ability to automatically identify it.
This creates an operationalization gap: we can describe phenomena we cannot yet detect at scale.
Why This Happens
Three factors reinforce the gap:
-
Classification is cheaper than detection. Human experts can identify new pattern types through observation; building reliable automated detection requires labeled data, trained models, and validated accuracy.
-
Dataset creation lags taxonomy growth. Each new type requires instances. Many patterns are rare, context-dependent, or manifest across multiple screens, making instance collection difficult.
-
Detection drives taxonomic refinement. Categories with active detection research get subdivided into finer types, while understudied categories remain coarse. Attention accumulates rather than distributes.
Consequences
- Automated tools systematically miss entire categories of concern
- Research converges on detectable patterns rather than impactful ones
- Regulatory frameworks reference taxonomies that cannot be operationalized
When This Pattern Appears
Any domain where classification precedes automation: security threat taxonomies vs. detection tools, medical diagnostic categories vs. screening tests, compliance violation types vs. audit systems.
Related: [None yet]