Representation Learning for Taxonomy Maintenance

Using learned embeddings to assist with taxonomy evolution, gap identification, and consistency checking. AI augmentation of knowledge engineering tasks.

Applications

Gap Detection: Identify concepts that should exist but don’t

  • Embed existing taxonomy terms
  • Find clusters in usage data without taxonomy coverage
  • Suggest new terms or categories

Inconsistency Detection: Find structural problems

  • Similar terms in distant taxonomy branches
  • Parent-child pairs with low semantic similarity
  • Synonyms not linked as equivalents

Evolution Assistance: Support taxonomy updates

  • Suggest where new terms belong
  • Identify candidates for merging or splitting
  • Predict impact of structural changes

Method Patterns

  1. Embed taxonomy terms using language models
  2. Embed usage data (queries, tagged content)
  3. Analyze alignment and gaps
  4. Human review of suggestions

Limitations

  • Embeddings capture statistical patterns, not domain logic
  • Novel concepts may have poor embeddings
  • Human expertise still required for final decisions

Related: 02-molecule—taxonomy-design, 06-molecule—ontology-design-patterns