AI-Assisted Taxonomy Maintenance
Context
Maintaining large taxonomies is expensive. New concepts emerge, relationships shift, terminology evolves. Expert review of thousands of concepts doesn’t scale. Yet purely automated maintenance introduces errors that compound over time.
Problem
How do you keep a taxonomy current and consistent when manual review is too slow but autonomous AI updates are too unreliable?
Solution
Use representation learning to flag concepts that warrant expert attention, rather than to make decisions autonomously.
The approach:
- Train a language model on domain data (job ads, existing taxonomy, related documents)
- Learn embeddings that place semantically similar concepts near each other
- Use the embedding space to identify anomalies:
- Concepts whose position doesn’t match their taxonomy location
- Potential new relationships the taxonomy doesn’t capture
- Mapping candidates for new terminology
- Surface flagged cases for expert review
What the model does:
- Suggests likely mappings (e.g., “top 5 ESCO occupations for this job title”)
- Identifies outliers (e.g., “this skill clusters with IT skills but is classified under healthcare”)
- Validates consistency (e.g., “occupations in this category cluster tightly vs. scattered”)
What the model doesn’t do:
- Make final decisions about taxonomy changes
- Add or remove concepts without human approval
- Override expert judgment
Consequences
Benefits:
- Dramatically reduces expert review burden
- Catches inconsistencies humans miss at scale
- Creates feedback loop between expert-built taxonomy and real-world usage patterns
Tradeoffs:
- Requires substantial training data
- Model quality varies by language/domain coverage
- False positives create review overhead
Results (ESCO): 75-94% of expert-validated mappings appear in model’s top-5 suggestions.
Related: 01-atom—human-in-the-loop