Optimizing the Interface Between Knowledge Graphs and LLMs for Complex Reasoning
Marković et al., 2025 | Cognee Inc.
Core Framing
The paper positions hyperparameter optimization as an underexplored but critical lever in GraphRAG systems. While architectural advances receive most attention, configuration choices across chunking, retrieval, and prompting have outsized impact on performance.
The transferable insight: In complex modular systems, tuning matters as much as architecture. Default configurations, even when “reasonable,” leave significant performance on the table.
Key Findings
Performance gains from systematic tuning:
- Correctness scores improved 62-71% over baseline across three benchmarks
- F1 scores improved 320-400%
- Exact match improved dramatically (baselines were near zero due to style mismatch)
Configuration sensitivity:
- No single configuration performed best across all tasks
- High-performing configurations shared some parameters (chunk size, retrieval method)
- Most effects were nonlinear and task-specific
Evaluation metric limitations:
- Exact match and F1 frequently penalized semantically correct but differently phrased answers
- LLM-based correctness scores were more tolerant but introduced their own inconsistencies
- Near-verbatim answers sometimes received less than full credit from LLM graders
The Cognee Framework
Cognee uses an Extract-Cognify-Load (ECL) pipeline:
- Extract: Ingest heterogeneous inputs (text, images, audio)
- Cognify: Transform unstructured input into structured, semantically grounded graph representations
- Load: Write to graph, relational, or vector stores
The term “cognify” (from Kevin Kelly) describes adding intelligence to already digitized systems.
Tunable Parameters Studied
| Parameter | Description |
|---|---|
| Chunk size | 200-2000 tokens per document segment |
| Retriever type | Text chunks via vector search vs. graph triplets |
| Top-k | Number of retrieved items (1-20) |
| QA prompt | Instruction template for answer generation |
| Graph prompt | Template guiding entity/relation extraction |
| Task getter | Whether summaries are generated during construction |
Retrieval Strategy Distinction
cognee_completion: Retrieves text chunks using vector search, passes directly to LLM.
cognee_graph_completion: Retrieves knowledge graph nodes and associated triplets by combining vector similarity with graph structure. Triplets formatted as structured text, emphasizing relational context.
Implications
- Task-specific optimization generalizes reasonably well to unseen examples
- Retrieval-augmented systems benefit from targeted, task-aware tuning
- Performance-overfitting tradeoffs can be managed without architectural change
- Standard evaluation measures (EM, F1) may not capture what matters in practice