paper

Survey on Graph and RAG

GraphRAG综述

Posted by Kylin on August 5, 2024

[TOC]

RAG

RAG的应用背景：

提高回答准确率

“increase LLM accuracy”
减少幻觉

“reducing LLM hallucination”
无需训练的实时信息更新和专业知识增强

“handling training-free up-to-date information and niche information”

但是对于RAG性能来说，检索的precision要比recall重要的多¹

KGs+RAG

Motivation：

query-focused summarization问题RAG回答不了²

“RAG does not consider broader contextual semantics, e.g., correlations among documents within corpora. (query-focused summarization)”
无效召回对RAG性能损害大

“RAG is a means of LLM optimization, and as such, RAG document selection must be precise, not general.”

而KGs是一种可能能够实现non-noisy RAG的方法[7, 8, 9]

“KGs may provide concise and understandable sources of subject matter expert knowledge[6] and non-noisy RAG” [7, 8, 9]

Graph RAG research的两个方面：

如何（准确，高效，trade-off）利用 KGs 做生成增强
如何（准确，高效，trade-off）从 document corpora 中构建 KGs（这是一种更general的方法¹）

Challenge：Graph RAG的一个基本定性是：穷举不可行。

“At the core of Graph RAG, research is the problem of relevant sub-graph identification from larger KGs, which is computationally infeasible (NP-hard) if done exhaustively.”

GraphRAG

主要还是把图分成了community，对每一个community保存了摘要用于总结。

Reference

Procko T. Graph Retrieval-Augmented Generation for Large Language Models: A Survey[J]. Available at SSRN, 2024. ↩ ↩²
Edge D, Trinh H, Cheng N, et al. From local to global: A graph rag approach to query-focused summarization[J]. arXiv preprint arXiv:2404.16130, 2024. ↩