The ShareAlike Derivative Work Ambiguity
ShareAlike licenses (CC BY-SA, ODbL) require that derivative works be distributed under the same license. In software, “derivative work” has established legal interpretation. In data, it’s murky.
Unanswered questions for enterprise data use:
- Does joining ShareAlike data with proprietary data create a derivative?
- Does internal enrichment without external distribution trigger obligations?
- Does querying a database and using results constitute derivation?
- If you train a model on ShareAlike data, is the model a derivative?
The legal theory hasn’t caught up with common data practices. Different lawyers give different answers.
Practical implication: ShareAlike-licensed datasets (DBpedia, YAGO, ConceptNet) may be valuable, but shouldn’t be deployed without explicit legal guidance. Internal-only use might not trigger ShareAlike, but “might” isn’t good enough for compliance.
The safe path: Prioritize public domain and attribution-only datasets. Use ShareAlike datasets only after formal legal assessment of your specific use case.
Related: 04-atom—license-tier-framework, 04-atom—data-governance