Annotation Quality Bias in Semantic Matching

When using embedding-based similarity to find relevant ontology concepts, ontologies with richer textual annotations systematically score higher, regardless of whether they’re actually better fits for the task.

An ontology with detailed labels, definitions, and comments gives the embedding model more signal to match against. A structurally superior ontology with sparse annotations gets penalized.

This means semantic similarity approaches favor well-documented knowledge structures over well-designed ones. The two aren’t the same thing.

The implication: automated knowledge extraction methods inherit the documentation habits of their sources. Poor documentation doesn’t just make things hard to find, it makes them invisible to AI-assisted discovery.

Related:, 07-atom—documentation-as-findability