Optimizing the Interface Between Knowledge Graphs and LLMs for Complex Reasoning

Marković et al., 2025 | Cognee Inc.

Core Framing

The paper positions hyperparameter optimization as an underexplored but critical lever in GraphRAG systems. While architectural advances receive most attention, configuration choices across chunking, retrieval, and prompting have outsized impact on performance.

The transferable insight: In complex modular systems, tuning matters as much as architecture. Default configurations, even when “reasonable,” leave significant performance on the table.

Key Findings

Performance gains from systematic tuning:

  • Correctness scores improved 62-71% over baseline across three benchmarks
  • F1 scores improved 320-400%
  • Exact match improved dramatically (baselines were near zero due to style mismatch)

Configuration sensitivity:

  • No single configuration performed best across all tasks
  • High-performing configurations shared some parameters (chunk size, retrieval method)
  • Most effects were nonlinear and task-specific

Evaluation metric limitations:

  • Exact match and F1 frequently penalized semantically correct but differently phrased answers
  • LLM-based correctness scores were more tolerant but introduced their own inconsistencies
  • Near-verbatim answers sometimes received less than full credit from LLM graders

The Cognee Framework

Cognee uses an Extract-Cognify-Load (ECL) pipeline:

  • Extract: Ingest heterogeneous inputs (text, images, audio)
  • Cognify: Transform unstructured input into structured, semantically grounded graph representations
  • Load: Write to graph, relational, or vector stores

The term “cognify” (from Kevin Kelly) describes adding intelligence to already digitized systems.

Tunable Parameters Studied

ParameterDescription
Chunk size200-2000 tokens per document segment
Retriever typeText chunks via vector search vs. graph triplets
Top-kNumber of retrieved items (1-20)
QA promptInstruction template for answer generation
Graph promptTemplate guiding entity/relation extraction
Task getterWhether summaries are generated during construction

Retrieval Strategy Distinction

cognee_completion: Retrieves text chunks using vector search, passes directly to LLM.

cognee_graph_completion: Retrieves knowledge graph nodes and associated triplets by combining vector similarity with graph structure. Triplets formatted as structured text, emphasizing relational context.

Implications

  1. Task-specific optimization generalizes reasonably well to unseen examples
  2. Retrieval-augmented systems benefit from targeted, task-aware tuning
  3. Performance-overfitting tradeoffs can be managed without architectural change
  4. Standard evaluation measures (EM, F1) may not capture what matters in practice