Large Language Models as Oracles for Ontology Alignment

Lushnei, S., Shumskyi, D., Shykula, S., Jiménez-Ruiz, E., & d’Avila Garcez, A. (2025)

Core Framing

The paper reframes the question from “Can LLMs do ontology alignment?” to “Where in an existing alignment pipeline can LLMs add the most value for the least cost?” This shifts from comprehensive LLM use to targeted intervention.

Key insight: Use LLMs only for the subset of mappings where traditional systems (LogMap) are uncertain. This is cost-effective and leverages LLM strengths without requiring them to handle the entire task.

Method

Extended LogMap to call LLM-based oracles for the “M_ask” subset, mappings where the system is uncertain. Tested GPT-4o Mini and Gemini Flash models (1.5, 2.0, 2.0 Lite, 2.5) with six prompt template variations across nine OAEI benchmark tasks.

Key Findings

  1. Performance: LLM oracles achieved Youden’s Index ~0.55 average, comparable to simulated human experts with 20% error rate
  2. Best configuration: Gemini Flash 2.5 with natural language-friendly prompts including synonyms (P^NLF_S)
  3. Prompt design: Natural language-friendly prompts outperformed structured prompts; synonyms helped more than extended hierarchical context
  4. Cost: 0.04 per 1,000 mapping assessments (100-250 tokens per query)
  5. Model evolution: Clear improvement across Gemini Flash versions (1.5 → 2.0 → 2.5)

Transferable Insights

  • Targeted AI intervention on uncertain cases beats comprehensive AI automation for cost and quality
  • LLMs perform best with conversational prompt framing, not technical/structured formats
  • Explicit inclusion of domain synonyms improves performance more than structural context
  • Binary classification prompts yield more reliable LLM responses than open-ended generation

Limitations Noted

  • Potential training data leakage (LLMs may have seen OAEI benchmarks)
  • Evaluation limited to biomedical domain ontologies
  • Only tested proprietary models due to performance gaps with open-source alternatives

Extracted Content