Survey on Graph and RAG

GraphRAG综述

Posted by Kylin on August 5, 2024

[TOC]

RAG

RAG的应用背景:

  • 提高回答准确率

    “increase LLM accuracy”

  • 减少幻觉

    “reducing LLM hallucination”

  • 无需训练的实时信息更新和专业知识增强

    “handling training-free up-to-date information and niche information”

但是对于RAG性能来说,检索的precision要比recall重要的多1

KGs+RAG

Motivation:

  • query-focused summarization问题RAG回答不了2

    “RAG does not consider broader contextual semantics, e.g., correlations among documents within corpora. (query-focused summarization)”

  • 无效召回对RAG性能损害大

    “RAG is a means of LLM optimization, and as such, RAG document selection must be precise, not general.”

而KGs是一种可能能够实现non-noisy RAG的方法[7, 8, 9]

“KGs may provide concise and understandable sources of subject matter expert knowledge[6] and non-noisy RAG” [7, 8, 9]

Graph RAG research的两个方面:

  • 如何(准确,高效,trade-off)利用 KGs 做生成增强

  • 如何(准确,高效,trade-off)从 document corpora 中构建 KGs(这是一种更general的方法1

Challenge:Graph RAG的一个基本定性是:穷举不可行。

“At the core of Graph RAG, research is the problem of relevant sub-graph identification from larger KGs, which is computationally infeasible (NP-hard) if done exhaustively.”

GraphRAG

主要还是把图分成了community,对每一个community保存了摘要用于总结。

Reference

  1. Procko T. Graph Retrieval-Augmented Generation for Large Language Models: A Survey[J]. Available at SSRN, 2024.  2

  2. Edge D, Trinh H, Cheng N, et al. From local to global: A graph rag approach to query-focused summarization[J]. arXiv preprint arXiv:2404.16130, 2024.