Open Source Data Assets for Enterprise AI Enrichment

Internal whitepaper prepared for Legal, Privacy, and Global Trade review. December 2025.

Purpose

License assessment and compliance framework for integrating open source datasets with enterprise data systems to support Microsoft Copilot deployment.

Key Contribution

A three-tier license framework for evaluating open source data:

Tier	License Type	Risk	Compliance
1	Public Domain (CC0, US Gov, PDDL)	Lowest	None required
2	Attribution Required (CC BY, Apache, BSD)	Low	Simple workflow
3	ShareAlike (CC BY-SA, ODbL)	Requires assessment	Derivative work determination

Connection to Garden Content

This whitepaper is the practical implementation of 04-molecule—reference-data-multiplier:

“Integrating permissively-licensed open source datasets with proprietary business data creates a semantic enrichment layer that enhances AI system performance.”

The datasets inventoried connect to earlier work:

Knowledge graphs (Wikidata, DBpedia) → 06-molecule—qualitative-research-knowledge-graph, 06-atom—entity-linking-dimensionality
Skills taxonomies (O*NET, ESCO) → workforce analytics applications
Technical ontologies → domain-specific enrichment

Datasets Evaluated

Tier 1 (Public Domain): Wikidata, BLS OEWS, BLS ORS, SOC System, ISCO-08

Tier 2 (Attribution): O*NET, ESCO, Canadian SCT, OSMT/RSDs, Common Core Ontologies, WordNet, GraphGen4Code, CodeOntology, ATOMIC 2020, Freebase

Tier 3 (ShareAlike): DBpedia, YAGO 4.5, ConceptNet

Excluded: BabelNet (non-commercial), SFIA (commercial license), Lightcast (subscription), OpenCyc (discontinued), NELL (no license)

Extracted Content

Atoms:

Molecules:

04-molecule—open-source-data-evaluation

Key Recommendations

Prioritize Tier 1 (public domain) for immediate deployment
Establish attribution workflow for Tier 2
Get legal guidance on ShareAlike before using Tier 3
Maintain provenance documentation in data catalog
Quarterly license review as datasets update

>heyMHK

Open Source Data Assets for Enterprise AI Enrichment

Open Source Data Assets for Enterprise AI Enrichment

Purpose

Key Contribution

Connection to Garden Content

Datasets Evaluated

Extracted Content

Key Recommendations

Properties

Graph view

Table of Contents