我要投稿

Conclusion结论

发布日期：2024-04-17 20:26:55 浏览次数： 2267

作者：詹大叔有话说

微信搜一搜，关注“詹大叔有话说”

Retrieval Augmented Generation (RAG) for LLMs
LLMs 的检索增强生成 (RAG)

There are many challenges when working with LLMs such as domain knowledge gaps, factuality issues, and hallucination. Retrieval Augmented Generation (RAG) provides a solution to mitigate some of these issues by augmenting LLMs with external knowledge such as databases. RAG is particularly useful in knowledge-intensive scenarios or domain-specific applications that require knowledge that's continually updating. A key advantage of RAG over other approaches is that the LLM doesn't need to be retrained for task-specific applications. RAG has been popularized recently with its application in conversational agents.
使用 LLMs 时会遇到许多挑战，例如领域知识差距、事实问题和幻觉。检索增强生成 (RAG) 提供了一种解决方案，通过使用数据库等外部知识增强 LLMs 来缓解其中一些问题。RAG 在知识密集型场景或需要不断更新知识的特定领域应用程序中特别有用。与其他方法相比，RAG 的一个关键优势是 LLM 不需要针对特定任务的应用程序进行重新训练。RAG 最近因其在会话代理中的应用而得到普及。

In this summary, we highlight the main findings and practical insights from the recent survey titled Retrieval-Augmented Generation for Large Language Models: A Survey (opens in a new tab) (Gao et al., 2023). In particular, we focus on the existing approaches, state-of-the-art RAG, evaluation, applications and technologies surrounding the different components that make up a RAG system (retrieval, generation, and augmentation techniques).
在本摘要中，我们重点介绍了最近题为“大型语言模型的检索增强生成：一项调查”的调查（Gao 等人，2023 年）的主要发现和实践见解。我们特别关注构成 RAG 系统的不同组件（检索、生成和增强技术）的现有方法、最先进的 RAG、评估、应用程序和技术。

Introduction to RAGRAG简介

As better introduced here, RAG can be defined as:
正如这里更好介绍的那样，RAG 可以定义为：

RAG takes input and retrieves a set of relevant/supporting documents given a source (e.g., Wikipedia). The documents are concatenated as context with the original input prompt and fed to the text generator which produces the final output. This makes RAG adaptive for situations where facts could evolve over time. This is very useful as LLMs's parametric knowledge is static. RAG allows language models to bypass retraining, enabling access to the latest information for generating reliable outputs via retrieval-based generation.
RAG 接受输入并检索一组给定来源（例如维基百科）的相关/支持文档。这些文档作为上下文与原始输入提示连接起来，并输入到生成最终输出的文本生成器。这使得 RAG 能够适应事实可能随时间变化的情况。这非常有用，因为 LLMs 的参数知识是静态的。RAG 允许语言模型绕过再训练，从而能够访问最新信息，从而通过基于检索的生成来生成可靠的输出。

In short, the retrieved evidence obtained in RAG can serve as a way to enhance the accuracy, controllability, and relevancy of the LLM's response. This is why RAG can help reduce issues of hallucination or performance when addressing problems in a highly evolving environment.
简而言之，RAG 中获取的检索证据可以作为增强 LLM 响应的准确性、可控性和相关性的一种方法。这就是为什么 RAG 在高度发展的环境中解决问题时可以帮助减少幻觉或性能问题。

While RAG has also involved the optimization of pre-training methods, current approaches have largely shifted to combining the strengths of RAG and powerful fine-tuned models like ChatGPT (opens in a new tab) and Mixtral . The chart below shows the evolution of RAG-related research:
虽然 RAG 还涉及预训练方法的优化，但目前的方法已在很大程度上转向结合 RAG 和强大的微调模型（如 ChatGPT 和 Mixtral）的优势。下图展示了RAG相关研究的演变：

Below is a typical RAG application workflow:
以下是典型的 RAG 应用程序工作流程：

We can explain the different steps/components as follows:
我们可以解释不同的步骤/组件如下：

Input: The question to which the LLM system responds is referred to as the input. If no RAG is used, the LLM is directly used to respond to the question.
输入：LLM系统响应的问题称为输入。如果没有使用RAG，则直接使用LLM来回答问题。
Indexing: If RAG is used, then a series of related documents are indexed by chunking them first, generating embeddings of the chunks, and indexing them into a vector store. At inference, the query is also embedded in a similar way.
索引：如果使用 RAG，则首先对一系列相关文档进行分块，生成块的嵌入，然后将它们索引到向量存储中，从而对它们进行索引。在推理时，查询也以类似的方式嵌入。
Retrieval: The relevant documents are obtained by comparing the query against the indexed vectors, also denoted as "Relevant Documents".
检索：通过将查询与索引向量进行比较来获得相关文档，也表示为“相关文档”。
Generation: The relevant documents are combined with the original prompt as additional context. The combined text and prompt are then passed to the model for response generation which is then prepared as the final output of the system to the user.
生成：相关文档与原始提示相结合作为附加上下文。然后将组合的文本和提示传递到模型以生成响应，然后将其准备为系统向用户的最终输出。

In the example provided, using the model directly fails to respond to the question due to a lack of knowledge of current events. On the other hand, when using RAG, the system can pull the relevant information needed for the model to answer the question appropriately.
在提供的示例中，由于缺乏对当前事件的了解，直接使用模型无法回答问题。另一方面，当使用 RAG 时，系统可以提取模型正确回答问题所需的相关信息。

RAG ParadigmsRAG范式

Over the past few years, RAG systems have evolved from Naive RAG to Advanced RAG and Modular RAG. This evolution has occurred to address certain limitations around performance, cost, and efficiency.
在过去的几年里，RAG 系统已经从 Naive RAG 发展到 Advanced RAG 和 Modular RAG。这种演变是为了解决性能、成本和效率方面的某些限制。

Naive RAG幼稚的拉格

Naive RAG follows the traditional aforementioned process of indexing, retrieval, and generation. In short, a user input is used to query relevant documents which are then combined with a prompt and passed to the model to generate a final response. Conversational history can be integrated into the prompt if the application involves multi-turn dialogue interactions.
Naive RAG 遵循上述传统的索引、检索和生成过程。简而言之，用户输入用于查询相关文档，然后与提示相结合并传递给模型以生成最终响应。如果应用涉及多轮对话交互，则可以将对话历史集成到提示中。

Naive RAG has limitations such as low precision (misaligned retrieved chunks) and low recall (failure to retrieve all relevant chunks). It's also possible that the LLM is passed outdated information which is one of the main issues that a RAG system should initially aim to solve. This leads to hallucination issues and poor and inaccurate responses.
Naive RAG 具有精度低（检索到的块未对齐）和召回率低（无法检索所有相关块）等局限性。LLM 也可能传递过时的信息，这是 RAG 系统最初应该解决的主要问题之一。这会导致幻觉问题以及糟糕且不准确的反应。

When augmentation is applied, there could also be issues with redundancy and repetition. When using multiple retrieved passages, ranking and reconciling style/tone are also key. Another challenge is ensuring that the generation task doesn't overly depend on the augmented information which can lead to the model just reiterating the retrieved content.
当应用增强时，还可能存在冗余和重复的问题。当使用多个检索到的段落时，排名和协调风格/语气也是关键。另一个挑战是确保生成任务不会过度依赖增强信息，这可能导致模型只是重复检索到的内容。

Advanced RAG高级RAG

Advanced RAG helps deal with issues present in Naive RAG such as improving retrieval quality that could involve optimizing the pre-retrieval, retrieval, and post-retrieval processes.
高级 RAG 有助于解决 Naive RAG 中存在的问题，例如提高检索质量，这可能涉及优化检索前、检索和检索后流程。

The pre-retrieval process involves optimizing data indexing which aims to enhance the quality of the data being indexed through five stages: enhancing data granularity, optimizing index structures, adding metadata, alignment optimization, and mixed retrieval.
预检索过程涉及优化数据索引，旨在通过增强数据粒度、优化索引结构、添加元数据、对齐优化和混合检索五个阶段来提高索引数据的质量。

The retrieval stage can be further improved by optimizing the embedding model itself which directly impacts the quality of the chunks that make up the context. This can be done by fine-tuning the embedding to optimize retrieval relevance or employing dynamic embeddings that better capture contextual understanding (e.g., OpenAI’s embeddings-ada-02 model).
通过优化嵌入模型本身可以进一步改进检索阶段，这直接影响构成上下文的块的质量。这可以通过微调嵌入来优化检索相关性或采用动态嵌入来更好地捕获上下文理解（例如 OpenAI 的 embeddings-ada-02 模型）来完成。

Optimizing post-retrieval focuses on avoiding context window limits and dealing with noisy or potentially distracting information. A common approach to address these issues is re-ranking which could involve approaches such as relocation of relevant context to the edges of the prompt or recalculating the semantic similarity between the query and relevant text chunks. Prompt compression may also help in dealing with these issues.
优化检索后的重点是避免上下文窗口限制并处理嘈杂或可能分散注意力的信息。解决这些问题的常见方法是重新排名，这可能涉及将相关上下文重新定位到提示边缘或重新计算查询和相关文本块之间的语义相似度等方法。及时压缩也可能有助于解决这些问题。

Modular RAG模块化RAG

As the name implies, Modular RAG enhances functional modules such as incorporating a search module for similarity retrieval and applying fine-tuning in the retriever. Both Naive RAG and Advanced RAG are special cases of Modular RAG and are made up of fixed modules. Extended RAG modules include search, memory, fusion, routing, predict, and task adapter which solve different problems. These modules can be rearranged to suit specific problem contexts. Therefore, Modular RAG benefits from greater diversity and flexibility in that you can add or replace modules or adjust the flow between modules based on task requirements.
顾名思义，Modular RAG 增强了功能模块，例如合并用于相似性检索的搜索模块以及在检索器中应用微调。Naive RAG 和 Advanced RAG 都是 Modular RAG 的特例，由固定模块组成。扩展的RAG模块包括搜索、记忆、融合、路由、预测和任务适配器，解决不同的问题。这些模块可以重新排列以适应特定的问题环境。因此，模块化 RAG 受益于更大的多样性和灵活性，您可以根据任务要求添加或替换模块或调整模块之间的流程。

Given the increased flexibility in building RAG systems, other important optimization techniques have been proposed to optimize RAG pipelines including:
鉴于构建 RAG 系统的灵活性不断提高，人们提出了其他重要的优化技术来优化 RAG 管道，包括：

Hybrid Search Exploration: This approach leverages a combination of search techniques like keyword-based search and semantic search to retrieve relevant and context-rich information; this is useful when dealing with different query types and information needs.
混合搜索探索：这种方法利用基于关键字的搜索和语义搜索等搜索技术的组合来检索相关且上下文丰富的信息；这在处理不同的查询类型和信息需求时非常有用。
Recursive Retrieval and Query Engine: Involves a recursive retrieval process that might start with small semantic chunks and subsequently retrieve larger chunks that enrich the context; this is useful to balance efficiency and context-rich information.
递归检索和查询引擎：涉及递归检索过程，该过程可能从小语义块开始，随后检索丰富上下文的较大块；这对于平衡效率和上下文丰富的信息很有用。
StepBack-prompt: A prompting technique (opens in a new tab)
StepBack-prompt：一种提示技术 that enables LLMs to perform abstraction that produces concepts and principles that guide reasoning; this leads to better-grounded responses when adopted to a RAG framework because the LLM moves away from specific instances and is allowed to reason more broadly if needed.
使LLMs能够执行抽象，产生指导推理的概念和原则；当采用 RAG 框架时，这会导致更有根据的响应，因为 LLM 远离特定实例，并且可以在需要时进行更广泛的推理。
Sub-Queries: There are different query strategies such as tree queries or sequential querying of chunks that can be used for different scenarios. LlamaIndex offers a sub question query engine (opens in a new tab)
子查询：有不同的查询策略，例如树查询或块的顺序查询，可用于不同的场景。LlamaIndex 提供子问题查询引擎 that allows a query to be broken down into several questions that use different relevant data sources.
它允许将查询分解为使用不同相关数据源的多个问题。
Hypothetical Document Embeddings: HyDE (opens in a new tab)
假设的文档嵌入：HyDE generates a hypothetical answer to a query, embeds it, and uses it to retrieve documents similar to the hypothetical answer as opposed to using the query directly.
生成查询的假设答案，嵌入它，并使用它来检索与假设答案类似的文档，而不是直接使用查询。

RAG FrameworkRAG框架

In this section, we summarize the key developments of the components of a RAG system, which include Retrieval, Generation, and Augmentation.
在本节中，我们总结了 RAG 系统组件的关键发展，包括检索、生成和增强。

Retrieval恢复

Retrieval is the component of RAG that deals with retrieving highly relevant context from a retriever. A retriever can be enhanced in many ways, including:
检索是 RAG 的组件，负责从检索器检索高度相关的上下文。检索器可以通过多种方式增强，包括：

Enhancing Semantic Representations
增强语义表示

This process involves directly improving the semantic representations that power the retriever. Here are a few considerations:
这个过程涉及直接改进为检索器提供支持的语义表示。以下是一些注意事项：

Chunking: One important step is choosing the right chunking strategy which depends on the content you are dealing with and the application you are generating responses for. Different models also display different strengths on varying block sizes. Sentence transformers will perform better on single sentences but text-embedding-ada-002 will perform better with blocks containing 256 or 512 tokens. Other aspects to consider include the length of user questions, application, and token limits but it's common to experiment with different chunking strategies to help optimize retrieval in your RAG system.
分块：重要的一步是选择正确的分块策略，该策略取决于您正在处理的内容以及您为其生成响应的应用程序。不同的模型在不同的块大小上也显示出不同的优势。句子转换器在单个句子上表现更好，但 text-embedding-ada-002 在包含 256 或 512 个标记的块上表现更好。其他需要考虑的方面包括用户问题的长度、应用程序和令牌限制，但通常会尝试不同的分块策略来帮助优化 RAG 系统中的检索。
Fine-tuned Embedding Models: Once you have determined an effective chunking strategy, it may be required to fine-tune the embedding model if you are working with a specialized domain. Otherwise, it's possible that the user queries will be completely misunderstood in your application. You can fine-tune on broad domain knowledge (i.e., domain knowledge fine-tuning) and for specific downstream tasks. BGE-large-EN developed BAAI
微调嵌入模型：一旦确定了有效的分块策略，如果您正在使用专门的领域，则可能需要微调嵌入模型。否则，用户查询可能会在您的应用程序中被完全误解。您可以对广泛的领域知识（即领域知识微调）和特定的下游任务进行微调。BGE-large-EN开发的BAAI is a notable embedding model that can be fine-tuned to optimize retrieval relevance.
是一个值得注意的嵌入模型，可以进行微调以优化检索相关性。

Aligning Queries and Documents
对齐查询和文档

This process deals with aligning user's queries to those of documents in the semantic space. This may be needed when a user's query may lack semantic information or contain imprecise phrasing. Here are some approaches:
此过程涉及将用户的查询与语义空间中的文档的查询对齐。当用户的查询可能缺乏语义信息或包含不精确的措辞时，可能需要这样做。以下是一些方法：

Query Rewriting: Focuses on rewriting queries using a variety of techniques such as Query2Doc (opens in a new tab)
查询重写：专注于使用各种技术（例如 Query2Doc）重写查询, ITER-RETGEN (opens in a new tab), 伊特尔雷根, and HyDE.和海德。
Embedding Transformation: Optimizes the representation of query embeddings and align them to a latent space that is more closely aligned with a task.
嵌入转换：优化查询嵌入的表示，并将它们对齐到与任务更紧密结合的潜在空间。

Aligning Retriever and LLM
对齐检索器和 LLM

This process deals with aligning the retriever outputs with the preferences of the LLMs.
此过程涉及将检索器输出与 LLMs 的偏好对齐。

Fine-tuning Retrievers: Uses an LLM's feedback signals to refine the retrieval models. Examples include augmentation adapted retriever (AAR (opens in a new tab)
微调检索器：使用 LLM 的反馈信号来细化检索模型。例子包括增强适应猎犬（AAR), REPLUG (opens in a new tab)), 重新插拔, and UPRISE (opens in a new tab)和起义, to name a few.
，仅举几例。
Adapters: Incorporates external adapters to help with the alignment process. Examples include PRCA (opens in a new tab)
适配器：包含外部适配器以帮助完成对齐过程。例子包括 PRCA, RECOMP (opens in a new tab), 重新编译, and PKG (opens in a new tab), 和 PKG.

Generation一代

The generator in a RAG system is responsible for converting retrieved information into a coherent text that will form the final output of the model. This process involves diverse input data which sometimes require efforts to refine the adaptation of the language model to the input data derived from queries and documents. This can be addressed using post-retrieval process and fine-tuning:
RAG 系统中的生成器负责将检索到的信息转换为连贯的文本，从而形成模型的最终输出。此过程涉及不同的输入数据，有时需要努力改进语言模型对源自查询和文档的输入数据的适应。这可以使用检索后过程和微调来解决：

Post-retrieval with Frozen LLM: Post-retrieval processing leaves the LLM untouched and instead focuses on enhancing the quality of retrieval results through operations like information compression and result reranking. Information compression helps with reducing noise, addressing an LLM's context length restrictions, and enhancing generation effects. Reranking aims at reordering documents to prioritize the most relevant items at the top.
使用 Frozen LLM 进行检索后：检索后处理使 LLM 保持不变，而是专注于通过信息压缩和结果重新排名等操作来提高检索结果的质量。信息压缩有助于减少噪声、解决LLM的上下文长度限制以及增强生成效果。重新排名旨在对文档重新排序，以将最相关的项目优先排序在顶部。
Fine-tuning LLM for RAG: To improve the RAG system, the generator can be further optimized or fine-tuned to ensure that the generated text is natural and effectively leverages the retrieved documents.
对 RAG 进行微调LLM：为了改进 RAG 系统，可以进一步优化或微调生成器，以确保生成的文本自然并有效利用检索到的文档。

Augmentation增强

Augmentation involves the process of effectively integrating context from retrieved passages with the current generation task. Before discussing more on the augmentation process, augmentation stages, and augmentation data, here is a taxonomy of RAG's core components:
增强涉及将检索到的段落中的上下文与当前生成任务有效集成的过程。在详细讨论增强过程、增强阶段和增强数据之前，以下是 RAG 核心组件的分类：

Retrieval augmentation can be applied in many different stages such as pre-training, fine-tuning, and inference.
检索增强可以应用于许多不同的阶段，例如预训练、微调和推理。

Augmentation Stages: RETRO (opens in a new tab) is an example of a system that leverages retrieval augmentation for large-scale pre-training from scratch; it uses an additional encoder built on top of external knowledge. Fine-tuning can also be combined with RAG to help develop and improve the effectiveness of RAG systems. At the inference stage, many techniques are applied to effectively incorporate retrieved content to meet specific task demands and further refine the RAG process.
增强阶段：RETRO 是一个利用检索增强从头开始进行大规模预训练的系统示例；它使用一个基于外部知识构建的附加编码器。微调还可以与 RAG 相结合，帮助开发和提高 RAG 系统的有效性。在推理阶段，应用了许多技术来有效地合并检索到的内容，以满足特定的任务需求并进一步细化 RAG 过程。
Augmentation Source: A RAG model's effectiveness is heavily impacted by the choice of augmentation data source. Data can be categorized into unstructured, structured, and LLM-generated data.
增强源：RAG 模型的有效性很大程度上受到增强数据源选择的影响。数据可以分为非结构化数据、结构化数据和LLM生成的数据。
Augmentation Process: For many problems (e.g., multi-step reasoning), a single retrieval isn't enough so a few methods have been proposed:
增强过程：对于许多问题（例如，多步骤推理），单次检索是不够的，因此提出了一些方法：

Iterative retrieval enables the model to perform multiple retrieval cycles to enhance the depth and relevance of information. Notable approaches that leverage this method include RETRO (opens in a new tab)
迭代检索使模型能够执行多个检索周期，以增强信息的深度和相关性。利用此方法的著名方法包括 RETRO and GAR-meets-RAG (opens in a new tab) GAR 遇见 RAG.
Recursive retrieval recursively iterates on the output of one retrieval step as the input to another retrieval step; this enables delving deeper into relevant information for complex and multi-step queries (e.g., academic research and legal case analysis). Notable approaches that leverage this method include IRCoT (opens in a new tab)
递归检索以递归方式迭代一个检索步骤的输出，作为另一检索步骤的输入；这使得能够更深入地研究复杂和多步骤查询的相关信息（例如学术研究和法律案例分析）。利用此方法的著名方法包括 IRCoT and Tree of Clarifications (opens in a new tab)
和澄清树.
Adaptive retrieval tailors the retrieval process to specific demands by determining optimal moments and content for retrieval. Notable approaches that leverage this method include FLARE (opens in a new tab)
自适应检索通过确定检索的最佳时刻和内容来根据特定需求定制检索过程。利用此方法的著名方法包括 FLARE and Self-RAG (opens in a new tab) 和自我RAG.

The figure below depicts a detailed representation of RAG research with different augmentation aspects, including the augmentation stages, source, and process.
下图详细描述了 RAG 研究的不同增强方面，包括增强阶段、来源和过程。

RAG vs. Fine-tuningRAG 与微调

There are a lot of open discussions about the difference between RAG and fine-tuning and in which scenarios each is appropriate. Research in these two areas suggests that RAG is useful for integrating new knowledge while fine-tuning can be used to improve model performance and efficiency through improving internal knowledge, output format, and teaching complex instruction following. These approaches are not mutually exclusive and can compliment each other in an iterative process that aims to improve the use of LLMs for a complex knowledge-intensive and scalable application that requires access to quickly-evolving knowledge and customized responses that follow a certain format, tone, and style. In addition, Prompting Engineering can also help to optimize results by leveraging the inherent capabilities of the model. Below is a figure showing the different characteristics of RAG compared with other model optimization methods:
关于 RAG 和微调之间的区别以及各自适用的场景有很多公开讨论。这两个领域的研究表明，RAG 对于整合新知识很有用，而微调可通过改进内部知识、输出格式和教授复杂的指令来提高模型性能和效率。这些方法并不相互排斥，并且可以在迭代过程中相互补充，旨在改进复杂知识密集型和可扩展应用程序中 LLMs 的使用，该应用程序需要访问快速发展的知识和定制响应，遵循一定的格式、语气和风格。此外，Prompting Engineering 还可以利用模型的固有功能来帮助优化结果。下图展示了RAG与其他模型优化方法相比的不同特点：

Here is table from the survey paper that compares the features between RAG and fine-tuned models:
以下是调查论文中的表格，比较了 RAG 和微调模型之间的特征：

RAG EvaluationRAG评估

Similar to measuring the performance of LLMs on different aspects, evaluation plays a key role in understanding and optimizing the performance of RAG models across diverse application scenarios. Traditionally, RAG systems have been assessed based on the performance of the downstream tasks using task-specific metrics like F1 and EM. RaLLe (opens in a new tab) is a notable example of a framework used to evaluate retrieval-augmented large language models for knowledge-intensive tasks.
与衡量LLMs在不同方面的性能类似，评估对于理解和优化RAG模型在不同应用场景中的性能起着关键作用。传统上，RAG 系统是根据下游任务的性能使用 F1 和 EM 等特定于任务的指标进行评估的。RaLLe 是用于评估知识密集型任务的检索增强大型语言模型的框架的一个著名示例。

RAG evaluation targets are determined for both retrieval and generation where the goal is to evaluate both the quality of the context retrieved and the quality of the content generated. To evaluate retrieval quality, metrics used in other knowledge-intensive domains like recommendation systems and information retrieval are used such as NDCG and Hit Rate. To evaluate generation quality, you can evaluate different aspects like relevance and harmfulness if it's unlabeled content or accuracy for labeled content. Overall, RAG evaluation can involve either manual or automatic evaluation methods.
RAG 评估目标是针对检索和生成确定的，其目标是评估检索到的上下文的质量和生成的内容的质量。为了评估检索质量，使用推荐系统和信息检索等其他知识密集型领域中使用的指标，例如 NDCG 和命中率。要评估生成质量，您可以评估不同的方面，例如未标记内容的相关性和危害性或标记内容的准确性。总体而言，RAG 评估可以涉及手动或自动评估方法。

Evaluating a RAG framework focuses on three primary quality scores and four abilities. Quality scores include measuring context relevance (i.e., the precision and specificity of retrieved context), answer faithfulness (i.e., the faithfulness of answers to the retrieved context), and answer relevance (i.e., the relevance of answers to posed questions). In addition, there are four abilities that help measure the adaptability and efficiency of a RAG system: noise robustness, negative rejection, information integration, and counterfactual robustness. Below is a summary of metrics used for evaluating different aspects of a RAG system:
评估 RAG 框架侧重于三个主要质量分数和四个能力。质量分数包括测量上下文相关性（即检索到的上下文的精确度和特异性）、答案忠实度（即答案对检索到的上下文的忠实度）和答案相关性（即答案与所提出问题的相关性）。此外，还有四种能力可以帮助衡量 RAG 系统的适应性和效率：噪声鲁棒性、负抑制、信息集成和反事实鲁棒性。以下是用于评估 RAG 系统不同方面的指标摘要：

Several benchmarks like RGB (opens in a new tab) and RECALL (opens in a new tab) are used to evaluate RAG models. Many tools like RAGAS (opens in a new tab), ARES (opens in a new tab), and TruLens (opens in a new tab) have been developed to automate the process of evaluating RAG systems. Some of the systems rely on LLMs to determine some of the quality scores defined above.
RGB 和 RECALL 等多个基准用于评估 RAG 模型。RAGAS、ARES 和 TruLens 等许多工具已被开发用于自动化评估 RAG 系统的过程。一些系统依赖 LLMs 来确定上面定义的一些质量分数。

Challenges & Future of RAG
RAG 的挑战与未来

In this overview, we discussed several research aspects of RAG research and different approaches for enhancing retrieval, augmentation, and generation of a RAG system. Here are several challenges emphasized by Gao et al., 2023 (opens in a new tab) as we continue developing and improving RAG systems:
在本概述中，我们讨论了 RAG 研究的几个研究方面以及增强 RAG 系统的检索、增强和生成的不同方法。以下是在我们继续开发和改进 RAG 系统时，Gao 等人在 2023 年强调的几个挑战：

Context length: LLMs continue to extend context window size which presents challenges to how RAG needs to be adapted to ensure highly relevant and important context is captured.
上下文长度：LLMs 继续扩展上下文窗口大小，这对如何调整 RAG 以确保捕获高度相关且重要的上下文提出了挑战。
Robustness: Dealing with counterfactual and adversarial information is important to measure and improve in RAG.
稳健性：处理反事实和对抗性信息对于衡量和改进 RAG 非常重要。
Hybrid approaches: There is an ongoing research effort to better understand how to best optimize the use of both RAG and fine-tuned models.
混合方法：正在进行的研究工作是为了更好地了解如何最好地优化 RAG 和微调模型的使用。
Expanding LLM roles: Increasing the role and capabilities of LLMs to further enhance RAG systems is of high interest.
扩展LLM角色：增加LLMs的角色和能力以进一步增强RAG系统是非常有意义的。
Scaling laws: Investigation of LLM scaling laws and how they apply to RAG systems are still not properly understood.
缩放定律：对LLM缩放定律的研究以及它们如何应用于RAG系统尚未得到正确理解。
Production-ready RAG: Production-grade RAG systems demand engineering excellence across performance, efficiency, data security, privacy, and more.
生产就绪的 RAG：生产级 RAG 系统需要在性能、效率、数据安全、隐私等方面实现卓越的工程设计。
Multimodal RAG: While there have been lots of research efforts around RAG systems, they have been mostly centered around text-based tasks. There is increasing interest in extending modalities for a RAG system to support tackling problems in more domains such as image, audio and video, code, and more.
多模式 RAG：虽然围绕 RAG 系统进行了大量研究工作，但它们主要集中在基于文本的任务。人们越来越关注扩展 RAG 系统的模式以支持解决更多领域的问题，例如图像、音频和视频、代码等。
Evaluation: The interest in building complex applications with RAG requires special attention to develop nuanced metrics and assessment tools that can more reliably assess different aspects such as contextual relevance, creativity, content diversity, factuality, and more. In addition, there is also a need for better interpretability research and tools for RAG.
评估：使用 RAG 构建复杂应用程序的兴趣需要特别注意开发细致入微的指标和评估工具，以便更可靠地评估不同方面，例如上下文相关性、创造力、内容多样性、真实性等。此外，RAG还需要更好的可解释性研究和工具。

RAG Tools拉格工具

Some popular comprehensive tools to build RAG systems include LangChain (opens in a new tab), LlamaIndex (opens in a new tab), and DSPy (opens in a new tab). There are also a range of specialized tools that serve different purposes such as Flowise AI (opens in a new tab) that offers a low-code solution for building RAG applications. Other notables technologies include HayStack (opens in a new tab), Meltano (opens in a new tab), Cohere Coral (opens in a new tab), and others. Software and cloud service providers are also including RAG-centric services. For instance, Verba from Weaviate is useful for building personal assistant applications and Amazon's Kendra offers intelligent enterprise search services.
一些流行的构建 RAG 系统的综合工具包括 LangChain、LlamaIndex 和 DSPy。还有一系列用于不同目的的专用工具，例如 Flowise AI，它为构建 RAG 应用程序提供了低代码解决方案。其他著名技术包括 HayStack、Meltano、Cohere Coral 等。软件和云服务提供商还提供以 RAG 为中心的服务。例如，Weaviate 的 Verba 对于构建个人助理应用程序很有用，而亚马逊的 Kendra 则提供智能企业搜索服务。

Conclusion结论

In conclusion, RAG systems have evolved rapidly including the development of more advanced paradigms that enable customization and further the performance and utility of RAG across a wide range of domains. There is a huge demand for RAG applications, which has accelerated the development of methods to improve the different components of a RAG system. From hybrid methodologies to self-retrieval, these are some of the currently explored research areas of modern RAG models. There is also increasing demand for better evaluation tools and metrics. The figure below provides a recap of the RAG ecosystem, techniques to enhance RAG, challenges, and other related aspects covered in this overview:
总之，RAG 系统发展迅速，包括开发更先进的范例，这些范例能够实现定制并进一步提高 RAG 在广泛领域的性能和实用性。对 RAG 应用的巨大需求加速了改进 RAG 系统不同组件的方法的开发。从混合方法到自检索，这些是现代 RAG 模型当前探索的一些研究领域。对更好的评估工具和指标的需求也不断增加。下图概述了 RAG 生态系统、增强 RAG 的技术、挑战以及本概述中涵盖的其他相关方面：