AI-Assisted Taxonomy Maintenance

Context

Maintaining large taxonomies is expensive. New concepts emerge, relationships shift, terminology evolves. Expert review of thousands of concepts doesn’t scale. Yet purely automated maintenance introduces errors that compound over time.

Problem

How do you keep a taxonomy current and consistent when manual review is too slow but autonomous AI updates are too unreliable?

Solution

Use representation learning to flag concepts that warrant expert attention, rather than to make decisions autonomously.

The approach:

Train a language model on domain data (job ads, existing taxonomy, related documents)
Learn embeddings that place semantically similar concepts near each other
Use the embedding space to identify anomalies:
- Concepts whose position doesn’t match their taxonomy location
- Potential new relationships the taxonomy doesn’t capture
- Mapping candidates for new terminology
Surface flagged cases for expert review

What the model does:

Suggests likely mappings (e.g., “top 5 ESCO occupations for this job title”)
Identifies outliers (e.g., “this skill clusters with IT skills but is classified under healthcare”)
Validates consistency (e.g., “occupations in this category cluster tightly vs. scattered”)

What the model doesn’t do:

Make final decisions about taxonomy changes
Add or remove concepts without human approval
Override expert judgment

Consequences

Benefits:

Dramatically reduces expert review burden
Catches inconsistencies humans miss at scale
Creates feedback loop between expert-built taxonomy and real-world usage patterns

Tradeoffs:

Requires substantial training data
Model quality varies by language/domain coverage
False positives create review overhead

Results (ESCO): 75-94% of expert-validated mappings appear in model’s top-5 suggestions.

Related: 01-atom—human-in-the-loop

>heyMHK

AI-Assisted Taxonomy Maintenance

AI-Assisted Taxonomy Maintenance

Context

Problem

Solution

Consequences

Properties

Graph view

Table of Contents