OntoRAG: Tiwari et al. 2025

Citation

Tiwari, Y., Lone, O. A., & Pal, M. (2025). OntoRAG: Enhancing Question-Answering through Automated Ontology Derivation from Unstructured Knowledge Bases. arXiv:2506.00664.

Core Argument

Traditional RAG uses vector similarity, which fails at “global sensemaking” (synthesizing dispersed information across documents. GraphRAG improves this with knowledge graphs but loses “ontological integrity” by clustering without preserving hierarchical structure. OntoRAG automates ontology derivation to preserve categorical relationships.

Key Findings

  • OntoRAG achieved 85% comprehensiveness win rate against vector RAG
  • OntoRAG achieved 75% comprehensiveness win rate against GraphRAG’s best configuration
  • Critical tradeoff: Vector RAG won 92% on “directness” (specific, targeted answers
  • Empowerment scores were comparable across all methods
  • Processing cost: 300 minutes for 1M token corpus (vs. 281 for GraphRAG)

Method Overview

Six-stage pipeline: web scraping → PDF parsing → hybrid chunking → information extraction → knowledge graph construction → ontology creation via Leiden community detection.

Evaluation Metrics

  • Comprehensiveness: Coverage of question aspects
  • Diversity: Variety of perspectives
  • Empowerment: Enabling informed judgments
  • Directness: Specificity in addressing the question

Limitations Acknowledged

  • Computational cost scales poorly
  • Domain-specific prompts reduce generalizability
  • Multiple similarity searches add retrieval overhead

Extracted For