微信扫码
与创始人交个朋友
我要投稿
There are many challenges when working with LLMs such as domain knowledge gaps, factuality issues, and hallucination. Retrieval Augmented Generation (RAG) provides a solution to mitigate some of these issues by augmenting LLMs with external knowledge such as databases. RAG is particularly useful in knowledge-intensive scenarios or domain-specific applications that require knowledge that's continually updating. A key advantage of RAG over other approaches is that the LLM doesn't need to be retrained for task-specific applications. RAG has been popularized recently with its application in conversational agents.
使用 LLMs 时会遇到许多挑战,例如领域知识差距、事实问题和幻觉。检索增强生成 (RAG) 提供了一种解决方案,通过使用数据库等外部知识增强 LLMs 来缓解其中一些问题。RAG 在知识密集型场景或需要不断更新知识的特定领域应用程序中特别有用。与其他方法相比,RAG 的一个关键优势是 LLM 不需要针对特定任务的应用程序进行重新训练。RAG 最近因其在会话代理中的应用而得到普及。
In this summary, we highlight the main findings and practical insights from the recent survey titled Retrieval-Augmented Generation for Large Language Models: A Survey (opens in a new tab) (Gao et al., 2023). In particular, we focus on the existing approaches, state-of-the-art RAG, evaluation, applications and technologies surrounding the different components that make up a RAG system (retrieval, generation, and augmentation techniques).
在本摘要中,我们重点介绍了最近题为“大型语言模型的检索增强生成:一项调查”的调查(Gao 等人,2023 年)的主要发现和实践见解。我们特别关注构成 RAG 系统的不同组件(检索、生成和增强技术)的现有方法、最先进的 RAG、评估、应用程序和技术。
As better introduced here, RAG can be defined as:
正如这里更好介绍的那样,RAG 可以定义为:
RAG takes input and retrieves a set of relevant/supporting documents given a source (e.g., Wikipedia). The documents are concatenated as context with the original input prompt and fed to the text generator which produces the final output. This makes RAG adaptive for situations where facts could evolve over time. This is very useful as LLMs's parametric knowledge is static. RAG allows language models to bypass retraining, enabling access to the latest information for generating reliable outputs via retrieval-based generation.
RAG 接受输入并检索一组给定来源(例如维基百科)的相关/支持文档。这些文档作为上下文与原始输入提示连接起来,并输入到生成最终输出的文本生成器。这使得 RAG 能够适应事实可能随时间变化的情况。这非常有用,因为 LLMs 的参数知识是静态的。RAG 允许语言模型绕过再训练,从而能够访问最新信息,从而通过基于检索的生成来生成可靠的输出。
In short, the retrieved evidence obtained in RAG can serve as a way to enhance the accuracy, controllability, and relevancy of the LLM's response. This is why RAG can help reduce issues of hallucination or performance when addressing problems in a highly evolving environment.
简而言之,RAG 中获取的检索证据可以作为增强 LLM 响应的准确性、可控性和相关性的一种方法。这就是为什么 RAG 在高度发展的环境中解决问题时可以帮助减少幻觉或性能问题。
While RAG has also involved the optimization of pre-training methods, current approaches have largely shifted to combining the strengths of RAG and powerful fine-tuned models like ChatGPT (opens in a new tab) and Mixtral . The chart below shows the evolution of RAG-related research:
虽然 RAG 还涉及预训练方法的优化,但目前的方法已在很大程度上转向结合 RAG 和强大的微调模型(如 ChatGPT 和 Mixtral)的优势。下图展示了RAG相关研究的演变:
Below is a typical RAG application workflow:
以下是典型的 RAG 应用程序工作流程:
We can explain the different steps/components as follows:
我们可以解释不同的步骤/组件如下:
In the example provided, using the model directly fails to respond to the question due to a lack of knowledge of current events. On the other hand, when using RAG, the system can pull the relevant information needed for the model to answer the question appropriately.
在提供的示例中,由于缺乏对当前事件的了解,直接使用模型无法回答问题。另一方面,当使用 RAG 时,系统可以提取模型正确回答问题所需的相关信息。
Over the past few years, RAG systems have evolved from Naive RAG to Advanced RAG and Modular RAG. This evolution has occurred to address certain limitations around performance, cost, and efficiency.
在过去的几年里,RAG 系统已经从 Naive RAG 发展到 Advanced RAG 和 Modular RAG。这种演变是为了解决性能、成本和效率方面的某些限制。
Naive RAG follows the traditional aforementioned process of indexing, retrieval, and generation. In short, a user input is used to query relevant documents which are then combined with a prompt and passed to the model to generate a final response. Conversational history can be integrated into the prompt if the application involves multi-turn dialogue interactions.
Naive RAG 遵循上述传统的索引、检索和生成过程。简而言之,用户输入用于查询相关文档,然后与提示相结合并传递给模型以生成最终响应。如果应用涉及多轮对话交互,则可以将对话历史集成到提示中。
Naive RAG has limitations such as low precision (misaligned retrieved chunks) and low recall (failure to retrieve all relevant chunks). It's also possible that the LLM is passed outdated information which is one of the main issues that a RAG system should initially aim to solve. This leads to hallucination issues and poor and inaccurate responses.
Naive RAG 具有精度低(检索到的块未对齐)和召回率低(无法检索所有相关块)等局限性。LLM 也可能传递过时的信息,这是 RAG 系统最初应该解决的主要问题之一。这会导致幻觉问题以及糟糕且不准确的反应。
When augmentation is applied, there could also be issues with redundancy and repetition. When using multiple retrieved passages, ranking and reconciling style/tone are also key. Another challenge is ensuring that the generation task doesn't overly depend on the augmented information which can lead to the model just reiterating the retrieved content.
当应用增强时,还可能存在冗余和重复的问题。当使用多个检索到的段落时,排名和协调风格/语气也是关键。另一个挑战是确保生成任务不会过度依赖增强信息,这可能导致模型只是重复检索到的内容。
Advanced RAG helps deal with issues present in Naive RAG such as improving retrieval quality that could involve optimizing the pre-retrieval, retrieval, and post-retrieval processes.
高级 RAG 有助于解决 Naive RAG 中存在的问题,例如提高检索质量,这可能涉及优化检索前、检索和检索后流程。
The pre-retrieval process involves optimizing data indexing which aims to enhance the quality of the data being indexed through five stages: enhancing data granularity, optimizing index structures, adding metadata, alignment optimization, and mixed retrieval.
预检索过程涉及优化数据索引,旨在通过增强数据粒度、优化索引结构、添加元数据、对齐优化和混合检索五个阶段来提高索引数据的质量。
The retrieval stage can be further improved by optimizing the embedding model itself which directly impacts the quality of the chunks that make up the context. This can be done by fine-tuning the embedding to optimize retrieval relevance or employing dynamic embeddings that better capture contextual understanding (e.g., OpenAI’s embeddings-ada-02 model).
通过优化嵌入模型本身可以进一步改进检索阶段,这直接影响构成上下文的块的质量。这可以通过微调嵌入来优化检索相关性或采用动态嵌入来更好地捕获上下文理解(例如 OpenAI 的 embeddings-ada-02 模型)来完成。
Optimizing post-retrieval focuses on avoiding context window limits and dealing with noisy or potentially distracting information. A common approach to address these issues is re-ranking which could involve approaches such as relocation of relevant context to the edges of the prompt or recalculating the semantic similarity between the query and relevant text chunks. Prompt compression may also help in dealing with these issues.
优化检索后的重点是避免上下文窗口限制并处理嘈杂或可能分散注意力的信息。解决这些问题的常见方法是重新排名,这可能涉及将相关上下文重新定位到提示边缘或重新计算查询和相关文本块之间的语义相似度等方法。及时压缩也可能有助于解决这些问题。
As the name implies, Modular RAG enhances functional modules such as incorporating a search module for similarity retrieval and applying fine-tuning in the retriever. Both Naive RAG and Advanced RAG are special cases of Modular RAG and are made up of fixed modules. Extended RAG modules include search, memory, fusion, routing, predict, and task adapter which solve different problems. These modules can be rearranged to suit specific problem contexts. Therefore, Modular RAG benefits from greater diversity and flexibility in that you can add or replace modules or adjust the flow between modules based on task requirements.
顾名思义,Modular RAG 增强了功能模块,例如合并用于相似性检索的搜索模块以及在检索器中应用微调。Naive RAG 和 Advanced RAG 都是 Modular RAG 的特例,由固定模块组成。扩展的RAG模块包括搜索、记忆、融合、路由、预测和任务适配器,解决不同的问题。这些模块可以重新排列以适应特定的问题环境。因此,模块化 RAG 受益于更大的多样性和灵活性,您可以根据任务要求添加或替换模块或调整模块之间的流程。
Given the increased flexibility in building RAG systems, other important optimization techniques have been proposed to optimize RAG pipelines including:
鉴于构建 RAG 系统的灵活性不断提高,人们提出了其他重要的优化技术来优化 RAG 管道,包括:
In this section, we summarize the key developments of the components of a RAG system, which include Retrieval, Generation, and Augmentation.
在本节中,我们总结了 RAG 系统组件的关键发展,包括检索、生成和增强。
Retrieval is the component of RAG that deals with retrieving highly relevant context from a retriever. A retriever can be enhanced in many ways, including:
检索是 RAG 的组件,负责从检索器检索高度相关的上下文。检索器可以通过多种方式增强,包括:
Enhancing Semantic Representations
增强语义表示
This process involves directly improving the semantic representations that power the retriever. Here are a few considerations:
这个过程涉及直接改进为检索器提供支持的语义表示。以下是一些注意事项:
Aligning Queries and Documents
对齐查询和文档
This process deals with aligning user's queries to those of documents in the semantic space. This may be needed when a user's query may lack semantic information or contain imprecise phrasing. Here are some approaches:
此过程涉及将用户的查询与语义空间中的文档的查询对齐。当用户的查询可能缺乏语义信息或包含不精确的措辞时,可能需要这样做。以下是一些方法:
Aligning Retriever and LLM
对齐检索器和 LLM
This process deals with aligning the retriever outputs with the preferences of the LLMs.
此过程涉及将检索器输出与 LLMs 的偏好对齐。
The generator in a RAG system is responsible for converting retrieved information into a coherent text that will form the final output of the model. This process involves diverse input data which sometimes require efforts to refine the adaptation of the language model to the input data derived from queries and documents. This can be addressed using post-retrieval process and fine-tuning:
RAG 系统中的生成器负责将检索到的信息转换为连贯的文本,从而形成模型的最终输出。此过程涉及不同的输入数据,有时需要努力改进语言模型对源自查询和文档的输入数据的适应。这可以使用检索后过程和微调来解决:
Augmentation involves the process of effectively integrating context from retrieved passages with the current generation task. Before discussing more on the augmentation process, augmentation stages, and augmentation data, here is a taxonomy of RAG's core components:
增强涉及将检索到的段落中的上下文与当前生成任务有效集成的过程。在详细讨论增强过程、增强阶段和增强数据之前,以下是 RAG 核心组件的分类:
Retrieval augmentation can be applied in many different stages such as pre-training, fine-tuning, and inference.
检索增强可以应用于许多不同的阶段,例如预训练、微调和推理。
Augmentation Stages: RETRO (opens in a new tab) is an example of a system that leverages retrieval augmentation for large-scale pre-training from scratch; it uses an additional encoder built on top of external knowledge. Fine-tuning can also be combined with RAG to help develop and improve the effectiveness of RAG systems. At the inference stage, many techniques are applied to effectively incorporate retrieved content to meet specific task demands and further refine the RAG process.
增强阶段:RETRO 是一个利用检索增强从头开始进行大规模预训练的系统示例;它使用一个基于外部知识构建的附加编码器。微调还可以与 RAG 相结合,帮助开发和提高 RAG 系统的有效性。在推理阶段,应用了许多技术来有效地合并检索到的内容,以满足特定的任务需求并进一步细化 RAG 过程。
Augmentation Source: A RAG model's effectiveness is heavily impacted by the choice of augmentation data source. Data can be categorized into unstructured, structured, and LLM-generated data.
增强源:RAG 模型的有效性很大程度上受到增强数据源选择的影响。数据可以分为非结构化数据、结构化数据和LLM生成的数据。
Augmentation Process: For many problems (e.g., multi-step reasoning), a single retrieval isn't enough so a few methods have been proposed:
增强过程:对于许多问题(例如,多步骤推理),单次检索是不够的,因此提出了一些方法:
The figure below depicts a detailed representation of RAG research with different augmentation aspects, including the augmentation stages, source, and process.
下图详细描述了 RAG 研究的不同增强方面,包括增强阶段、来源和过程。
There are a lot of open discussions about the difference between RAG and fine-tuning and in which scenarios each is appropriate. Research in these two areas suggests that RAG is useful for integrating new knowledge while fine-tuning can be used to improve model performance and efficiency through improving internal knowledge, output format, and teaching complex instruction following. These approaches are not mutually exclusive and can compliment each other in an iterative process that aims to improve the use of LLMs for a complex knowledge-intensive and scalable application that requires access to quickly-evolving knowledge and customized responses that follow a certain format, tone, and style. In addition, Prompting Engineering can also help to optimize results by leveraging the inherent capabilities of the model. Below is a figure showing the different characteristics of RAG compared with other model optimization methods:
关于 RAG 和微调之间的区别以及各自适用的场景有很多公开讨论。这两个领域的研究表明,RAG 对于整合新知识很有用,而微调可通过改进内部知识、输出格式和教授复杂的指令来提高模型性能和效率。这些方法并不相互排斥,并且可以在迭代过程中相互补充,旨在改进复杂知识密集型和可扩展应用程序中 LLMs 的使用,该应用程序需要访问快速发展的知识和定制响应,遵循一定的格式、语气和风格。此外,Prompting Engineering 还可以利用模型的固有功能来帮助优化结果。下图展示了RAG与其他模型优化方法相比的不同特点:
Here is table from the survey paper that compares the features between RAG and fine-tuned models:
以下是调查论文中的表格,比较了 RAG 和微调模型之间的特征:
Similar to measuring the performance of LLMs on different aspects, evaluation plays a key role in understanding and optimizing the performance of RAG models across diverse application scenarios. Traditionally, RAG systems have been assessed based on the performance of the downstream tasks using task-specific metrics like F1 and EM. RaLLe (opens in a new tab) is a notable example of a framework used to evaluate retrieval-augmented large language models for knowledge-intensive tasks.
与衡量LLMs在不同方面的性能类似,评估对于理解和优化RAG模型在不同应用场景中的性能起着关键作用。传统上,RAG 系统是根据下游任务的性能使用 F1 和 EM 等特定于任务的指标进行评估的。RaLLe 是用于评估知识密集型任务的检索增强大型语言模型的框架的一个著名示例。
RAG evaluation targets are determined for both retrieval and generation where the goal is to evaluate both the quality of the context retrieved and the quality of the content generated. To evaluate retrieval quality, metrics used in other knowledge-intensive domains like recommendation systems and information retrieval are used such as NDCG and Hit Rate. To evaluate generation quality, you can evaluate different aspects like relevance and harmfulness if it's unlabeled content or accuracy for labeled content. Overall, RAG evaluation can involve either manual or automatic evaluation methods.
RAG 评估目标是针对检索和生成确定的,其目标是评估检索到的上下文的质量和生成的内容的质量。为了评估检索质量,使用推荐系统和信息检索等其他知识密集型领域中使用的指标,例如 NDCG 和命中率。要评估生成质量,您可以评估不同的方面,例如未标记内容的相关性和危害性或标记内容的准确性。总体而言,RAG 评估可以涉及手动或自动评估方法。
Evaluating a RAG framework focuses on three primary quality scores and four abilities. Quality scores include measuring context relevance (i.e., the precision and specificity of retrieved context), answer faithfulness (i.e., the faithfulness of answers to the retrieved context), and answer relevance (i.e., the relevance of answers to posed questions). In addition, there are four abilities that help measure the adaptability and efficiency of a RAG system: noise robustness, negative rejection, information integration, and counterfactual robustness. Below is a summary of metrics used for evaluating different aspects of a RAG system:
评估 RAG 框架侧重于三个主要质量分数和四个能力。质量分数包括测量上下文相关性(即检索到的上下文的精确度和特异性)、答案忠实度(即答案对检索到的上下文的忠实度)和答案相关性(即答案与所提出问题的相关性)。此外,还有四种能力可以帮助衡量 RAG 系统的适应性和效率:噪声鲁棒性、负抑制、信息集成和反事实鲁棒性。以下是用于评估 RAG 系统不同方面的指标摘要:
Several benchmarks like RGB (opens in a new tab) and RECALL (opens in a new tab) are used to evaluate RAG models. Many tools like RAGAS (opens in a new tab), ARES (opens in a new tab), and TruLens (opens in a new tab) have been developed to automate the process of evaluating RAG systems. Some of the systems rely on LLMs to determine some of the quality scores defined above.
RGB 和 RECALL 等多个基准用于评估 RAG 模型。RAGAS、ARES 和 TruLens 等许多工具已被开发用于自动化评估 RAG 系统的过程。一些系统依赖 LLMs 来确定上面定义的一些质量分数。
In this overview, we discussed several research aspects of RAG research and different approaches for enhancing retrieval, augmentation, and generation of a RAG system. Here are several challenges emphasized by Gao et al., 2023 (opens in a new tab) as we continue developing and improving RAG systems:
在本概述中,我们讨论了 RAG 研究的几个研究方面以及增强 RAG 系统的检索、增强和生成的不同方法。以下是在我们继续开发和改进 RAG 系统时,Gao 等人在 2023 年强调的几个挑战:
Some popular comprehensive tools to build RAG systems include LangChain (opens in a new tab), LlamaIndex (opens in a new tab), and DSPy (opens in a new tab). There are also a range of specialized tools that serve different purposes such as Flowise AI (opens in a new tab) that offers a low-code solution for building RAG applications. Other notables technologies include HayStack (opens in a new tab), Meltano (opens in a new tab), Cohere Coral (opens in a new tab), and others. Software and cloud service providers are also including RAG-centric services. For instance, Verba from Weaviate is useful for building personal assistant applications and Amazon's Kendra offers intelligent enterprise search services.
一些流行的构建 RAG 系统的综合工具包括 LangChain、LlamaIndex 和 DSPy。还有一系列用于不同目的的专用工具,例如 Flowise AI,它为构建 RAG 应用程序提供了低代码解决方案。其他著名技术包括 HayStack、Meltano、Cohere Coral 等。软件和云服务提供商还提供以 RAG 为中心的服务。例如,Weaviate 的 Verba 对于构建个人助理应用程序很有用,而亚马逊的 Kendra 则提供智能企业搜索服务。
In conclusion, RAG systems have evolved rapidly including the development of more advanced paradigms that enable customization and further the performance and utility of RAG across a wide range of domains. There is a huge demand for RAG applications, which has accelerated the development of methods to improve the different components of a RAG system. From hybrid methodologies to self-retrieval, these are some of the currently explored research areas of modern RAG models. There is also increasing demand for better evaluation tools and metrics. The figure below provides a recap of the RAG ecosystem, techniques to enhance RAG, challenges, and other related aspects covered in this overview:
总之,RAG 系统发展迅速,包括开发更先进的范例,这些范例能够实现定制并进一步提高 RAG 在广泛领域的性能和实用性。对 RAG 应用的巨大需求加速了改进 RAG 系统不同组件的方法的开发。从混合方法到自检索,这些是现代 RAG 模型当前探索的一些研究领域。对更好的评估工具和指标的需求也不断增加。下图概述了 RAG 生态系统、增强 RAG 的技术、挑战以及本概述中涵盖的其他相关方面:
Figures Source: Retrieval-Augmented Generation for Large Language Models: A Survey
数据来源:大型语言模型的检索增强生成:一项调查
53AI,企业落地应用大模型首选服务商
产品:大模型应用平台+智能体定制开发+落地咨询服务
承诺:先做场景POC验证,看到效果再签署服务协议。零风险落地应用大模型,已交付160+中大型企业
2024-03-30
2024-04-26
2024-05-10
2024-05-28
2024-04-12
2024-04-25
2024-05-14
2024-07-18
2024-08-13
2024-04-26