The Reference Data Multiplier
The Concept
Proprietary data isn’t the AI advantage most organizations think it is. What that data connects to is.
Reference data, taxonomies, ontologies, entity registries, relationship schemas, multiplies the value of proprietary data by making it connectable and computable.
Why This Matters
Organizations assume their proprietary data is their moat: “We have twenty years of customer data.” “Our operational data is unique.”
This assumption is often incomplete. Proprietary data frequently has challenges:
- Inconsistent formats across systems
- Implicit assumptions that made sense to creators but aren’t documented
- Gaps that weren’t problems for original use cases but matter for AI applications
Proprietary data is often an advantage in potential. Reference data converts that potential into something usable.
What Reference Data Provides
Taxonomies and ontologies: Standard categorization schemes enabling comparison across different data sources.
Entity registries: Disambiguation, connecting “IBM” and “International Business Machines” to the same entity.
Relationship schemas: Standard ways of expressing connections between entities.
The Multiplier Mechanism
The pattern:
- Proprietary data provides isolated facts
- Reference data provides relationships
- Relationships enable computation
- Computation creates insight
Without reference data, you have facts. With reference data, you have a knowledge structure that can answer questions the original data creators never anticipated.
Practical Implications
The mapping work, connecting proprietary data to reference standards, is where value gets created. This work is often undervalued because it feels like infrastructure rather than innovation.
But the organization that has mapped its customer data to standard industry taxonomies, linked its product data to external registries, and connected its operational data to reference ontologies can do things competitors can’t.
Example
Raw: “Customer bought Product X on Date Y”
With reference data:
- Product X → category hierarchy → competitor products → market segment
- Customer → industry classification → company size tier → geographic region
- Date Y → fiscal period → seasonality patterns → market conditions
The same transaction becomes queryable across dimensions that weren’t in the original data.
Related: 06-molecule—knowledge-graph-construction, 06-atom—entity-linking