微信扫码
与创始人交个朋友
我要投稿
A compressive and high-level summarization of RAG .
RAG 的压缩和高级总结。In Part I, we will focus the concept and components of Modular RAG, containing 6 module types, 14 modules and 40+ operators.
在第一部分中,我们将重点介绍模块化 RAG 的概念和组件,包含 6 种模块类型、14 个模块和 40 多个运算符。
Over the past year, the concept of Retrieval-Augmented Generation (RAG) as a method for implementing LLM applications has garnered considerable attention. We have authored a comprehensive survey on RAG , delving into the shift from Naive RAG to Advanced RAG and Modular RAG. However, the survey primarily scrutinized RAG technology through the lens of Augmentation (e.g. Augmentation Source/Stage/Process).
在过去的一年里,检索增强生成(RAG)的概念作为实现LLM应用程序的方法引起了相当多的关注。我们撰写了有关 RAG 的综合调查,深入探讨了从 Naive RAG 到 Advanced RAG 和 Modular RAG 的转变。然而,该调查主要通过增强的视角(例如增强源/阶段/过程)来审查 RAG 技术。
This piece will specifically center on the Modular RAG paradigm. We further defined a three-tier Modular RAG paradigm, comprising Module Type, Module, and Operator. Under this paradigm, we expound upon the core technologies within the current RAG system, encompassing 6 major Module Types, 14 Modules, and 40+Operators, aiming to provide a comprehensive understanding of RAG.
本文将特别关注模块化 RAG 范例。我们进一步定义了三层模块化 RAG 范例,包括模块类型、模块和操作符。在此范式下,我们阐述了当前RAG系统中的核心技术,包括6大模块类型、14个模块和40+算子,旨在提供对RAG的全面理解。
By orchestrating different operators, we can derive various RAG Flows, a concept we aim to elucidate in this article. Drawing from extensive research, we have distilled and summarized typical patterns, several specific implementation cases and best industry cases. (Due to space constraints, this part will be addressed in Part II.)
通过编排不同的运算符,我们可以派生出各种 RAG 流,这是我们旨在在本文中阐明的概念。经过广泛的研究,我们提炼和总结了典型模式、几个具体实施案例和最佳行业案例。(由于篇幅限制,这部分内容将在第二部分中讨论。)
The objective of this article is to offer a more sophisticated comprehension of the present state of RAG development and to pave the way for future advancements. Modular RAG presents plenty opportunities, facilitating the definition of new operators, modules, and the configuration of new Flows.
本文的目的是提供对 RAG 开发现状的更深入的理解,并为未来的发展铺平道路。模块化 RAG 提供了大量机会,有助于定义新运算符、模块以及新流程的配置。
The Figures in our RAG Survey
RAG 调查中的数据
The progress of RAG has brought about a more diverse and flexible process, as evidenced by the following crucial aspects:
RAG的进步带来了更加多样化和灵活的流程,具体表现在以下几个关键方面:
Definition of Modular RAG
模块化 RAG 的定义
Above, we can see that the rapid development of RAG has surpassed the Chain-style Advanced RAG paradigm, showcasing a modular characteristic. To address the current lack of organization and abstraction, we propose a Modular RAG approach that seamlessly integrates the development paradigms of Naive RAG and Advanced RAG.
从上图可以看出,RAG的快速发展已经超越了链式高级RAG范式,呈现出模块化的特征。为了解决当前缺乏组织和抽象的问题,我们提出了一种模块化 RAG 方法,该方法无缝集成了 Naive RAG 和 Advanced RAG 的开发范例。
Modular RAG presents a highly scalable paradigm, dividing the RAG system into a three-layer structure of Module Type, Modules, and Operators. Each Module Type represents a core process in the RAG system, containing multiple functional modules. Each functional module, in turn, includes multiple specific operators. The entire RAG system becomes a permutation and combination of multiple modules and corresponding operators, forming what we refer to as RAG Flow. Within the Flow, different functional modules can be selected in each module type, and within each functional module, one or more operators can be chosen.
模块化RAG提出了一种高度可扩展的范例,将RAG系统分为模块类型、模块和操作符的三层结构。每个Module Type代表RAG系统中的一个核心流程,包含多个功能模块。每个功能模块又包含多个特定的操作符。整个RAG系统变成了多个模块和相应算子的排列组合,形成了我们所说的RAG Flow。在流程中,每种模块类型可以选择不同的功能模块,并且在每个功能模块中可以选择一个或多个操作符。
The relationship with the previous paradigm
与之前范式的关系
The Modular RAG organizes the RAG system in a multi-tiered modular form. Advanced RAG is a modular form of RAG, and Naive RAG is a special case of Advanced RAG. The relationship between the three paradigms is one of inheritance and development.
模块化 RAG 以多层模块化形式组织 RAG 系统。Advanced RAG 是 RAG 的模块化形式,Naive RAG 是 Advanced RAG 的特例。三种范式之间是一种继承与发展的关系。
Opportunities in Modular RAG
模块化 RAG 的机会
The benefits of Modular RAG are evident, providing a fresh and comprehensive perspective on existing RAG-related work. Through modular organization, relevant technologies and methods are clearly summarized.
模块化 RAG 的好处是显而易见的,它为现有的 RAG 相关工作提供了全新且全面的视角。通过模块化的组织,清晰地总结了相关的技术和方法。
The Framework of Modular RAG
模块化RAG的框架
In this chapter, we will delve into the three-tier structure and constrcuct a technical roadmap for RAG. Due to space constraints, we will refrain from delving into technical specifics; however, comprehensive references will be provided for further reading.
在本章中,我们将深入研究 RAG 的三层结构并构建 RAG 的技术路线图。由于篇幅限制,我们不会深入探讨技术细节;但是,将提供全面的参考资料以供进一步阅读。
Indexing, the process of breaking down text into manageable chunks, is a crucial step in organizing the system, facing three main challenges:
索引是将文本分解为可管理块的过程,是组织系统的关键步骤,面临三个主要挑战:
Larger chunks can capture more context, but they also generate more noise, requiring longer processing time and higher costs. While smaller chunks may not fully convey the necessary context, they do have less noise.
较大的块可以捕获更多的上下文,但它们也会产生更多的噪音,需要更长的处理时间和更高的成本。虽然较小的块可能无法完全传达必要的上下文,但它们确实具有较少的噪音。
One simple way to balance these demands is to use overlapping chunks.By employing a sliding window, semantic transitions are enhanced. However, limitations exist, including imprecise control over context size, the risk of truncating words or sentences, and a lack of semantic considerations.
平衡这些需求的一种简单方法是使用重叠块。通过使用滑动窗口,可以增强语义转换。然而,它也存在局限性,包括对上下文大小的不精确控制、截断单词或句子的风险以及缺乏语义考虑。
The key idea is to separate the chunks used for retrieval from the chunks used for synthesis. Using smaller chunks can improve the accuracy of retrieval, while larger chunks can provide more context information.
关键思想是将用于检索的块与用于合成的块分开。使用较小的块可以提高检索的准确性,而较大的块可以提供更多的上下文信息。
Specifically, one approach could involve retrieving smaller chunks and then referencing parent IDs to return larger chunks. Alternatively, individual sentences could be retrieved, and the surrounding text window of the sentence returned.
具体来说,一种方法可能涉及检索较小的块,然后引用父 ID 以返回较大的块。或者,可以检索单个句子,并返回该句子周围的文本窗口。
Detailed information and LlamaIndex Implementation. 详细信息和 LlamaIndex 实施。
链接:https://llamahub.ai/l/llama_packs-recursive_retriever-small_to_big?from=all
Advanced RAG 01: Small-to-Big Retrieval 高级 RAG 01:从小到大检索
链接:https://towardsdatascience.com/advanced-rag-01-small-to-big-retrieval-172181b396d4
It is akin to the Small-to-Big concept, where a summary of larger chunks is generated first, and the retrieval is performed on the summary. Subsequently, a secondary retrieval can be conducted on the larger chunks.
它类似于从小到大的概念,首先生成较大块的摘要,然后对摘要执行检索。随后,可以对较大的块进行二次检索。
Chunks can be enriched with metadata information such as page number, file name, author, timestamp, summary, or the questions that the chunk can answer. Subsequently, retrieval can be filtered based on this metadata, limiting the scope of the search. See the implementation in LlamaIndex.
可以使用元数据信息来丰富块,例如页码、文件名、作者、时间戳、摘要或块可以回答的问题。随后,可以根据该元数据过滤检索,从而限制搜索范围。请参阅 LlamaIndex 中的实现。
One effective method for enhancing information retrieval is to establish a hierarchical structure for the documents. By constructing chunks structure, RAG system can expedite the retrieval and processing of pertinent data.
增强信息检索的一种有效方法是建立文档的层次结构。通过构建块结构,RAG系统可以加快相关数据的检索和处理。
In the hierarchical structure of documents, nodes are arranged in parent-child relationships, with chunks linked to them. Data summaries are stored at each node, aiding in the swift traversal of data and assisting the RAG system in determining which chunks to extract. This approach can also mitigate the illusion caused by block extraction issues.
在文档的层次结构中,节点以父子关系排列,块链接到它们。数据摘要存储在每个节点上,有助于快速遍历数据并协助 RAG 系统确定要提取哪些块。这种方法还可以减轻由块提取问题引起的错觉。
The methods for constructing a structured index primarily include:
构建结构化索引的方法主要有:
Check Arcus’s hierarchical index at large-scale.
大规模检查 Arcus 的层次索引。
The utilization of Knowledge Graphs (KGs) in constructing the hierarchical structure of documents contributes to maintaining consistency. It delineates the connections between different concepts and entities, markedly reducing the potential for illusions.
利用知识图(KG)构建文档的层次结构有助于保持一致性。它描绘了不同概念和实体之间的联系,显着减少了产生幻觉的可能性。
Another advantage is the transformation of the information retrieval process into instructions that LLM can comprehend, thereby enhancing the accuracy of knowledge retrieval and enabling LLM to generate contextually coherent responses, thus improving the overall efficiency of the RAG system.
另一个优点是将信息检索过程转化为LLM可以理解的指令,从而提高知识检索的准确性,并使LLM能够生成上下文连贯的响应,从而提高整体效率RAG 系统。
Check Neo4j implementation and LllmaIndex Neo4j query engine.
检查 Neo4j 实现和 LllmaIndex Neo4j 查询引擎。
For organizing multiple documents using KG, you can refer to this research paper KGP:Knowledge Graph Prompting for Multi-Document Question Answering.
关于使用KG组织多个文档,可以参考这篇研究论文KGP:Knowledge Graph Prompting for Multi-Document Question Answering。
链接:https://arxiv.org/abs/2308.11730
Knowledge Graph Prompting: A New Approach for Multi-Document Question Answering
知识图提示:多文档问答的新方法
链接:https://medium.com/@alcarazanthony1/knowledge-graph-prompting-a-new-approach-for-multi-document-question-answering-ab5c4006a429
One of the primary challenges with Naive RAG is its direct reliance on the user’s orginal query as the basis for retrieval. Formulating a precise and clear question is difficult, and imprudent queries result in subpar retrieval effectiveness.
Naive RAG 的主要挑战之一是它直接依赖用户的原始查询作为检索的基础。提出精确而清晰的问题很困难,不谨慎的查询会导致检索效率不佳。
The primary challenges in this stage include:
这一阶段的主要挑战包括:
Expanding a single query into multiple queries enriches the content of the query, providing further context to address any lack of specific nuances, thereby ensuring the optimal relevance of the generated answers.
将单个查询扩展到多个查询可以丰富查询的内容,提供进一步的上下文来解决任何缺乏特定细微差别的问题,从而确保生成的答案的最佳相关性。
By employing prompt engineering to expand queries via LLMs, these queries can then be executed in parallel. The expansion of queries is not random, but rather meticulously designed. Two crucial criteria for this design are the diversity and coverage of the queries.
通过使用提示工程通过 LLMs 扩展查询,然后可以并行执行这些查询。查询的扩展不是随意的,而是经过精心设计的。此设计的两个关键标准是查询的多样性和覆盖范围。
One of the challenges of using multiple queries is the potential dilution of the user’s original intent. To mitigate this, we can instruct the model to assign greater weight to the original query in prompt engineering.
使用多个查询的挑战之一是可能会削弱用户的原始意图。为了缓解这种情况,我们可以指示模型在提示工程中为原始查询分配更大的权重。
The process of sub-question planning represents the generation of the necessary sub-questions to contextualize and fully answer the original question when combined. This process of adding relevant context is, in principle, similar to query expansion. Specifically, a complex question can be decomposed into a series of simpler sub-questions using the least-to-most prompting method.
子问题规划的过程代表了必要的子问题的生成,以便在组合时将原始问题结合起来并充分回答原始问题。原则上,添加相关上下文的过程类似于查询扩展。具体来说,可以使用从最少到最多的提示方法将一个复杂的问题分解为一系列较简单的子问题。
Sub Question Query Engine - LlamaIndex ? 0.9.36
子问题查询引擎 - LlamaIndex ? 0.9.36
链接:https://docs.llamaindex.ai/en/stable/examples/query_engine/sub_question_query_engine/
Another approach to query expansion involves the use of the Chain-of-Verification(CoVe) proposed by Meta AI. The expanded queries undergo validation by LLM to achieve the effect of reducing hallucinations. Validated expanded queries typically exhibit higher reliability.
另一种查询扩展方法涉及使用 Meta AI 提出的验证链(CoVe)。扩展后的查询经过LLM验证,达到减少幻觉的效果。经过验证的扩展查询通常表现出更高的可靠性。
Retrieve and generate using a transformed query instead of the user’s original query.
使用转换后的查询而不是用户的原始查询来检索和生成。
The original queries are not always optimal for LLM retrieval, especially in real-world scenarios. Therefore, we can prompt LLM to rewrite the queries. In addition to using LLM for query rewriting, specialized smaller language models, such as RRR(Rewrite-retrieve-read), can also be utilized.
原始查询并不总是最适合 LLM 检索,尤其是在现实场景中。因此,我们可以提示 LLM 重写查询。除了使用LLM进行查询重写之外,还可以利用专门的较小语言模型,例如RRR(Rewrite-retrieve-read)。
The implementation of the Query Rewrite method in the Taobao promotion system, known as BEQUE:Query Rewriting for Retrieval-Augmented Large Language Models, has notably enhanced recall effectiveness for long-tail queries, resulting in a rise in GMV.
淘宝促销系统中Query Rewrite方法的实施,即BEQUE:Query Rewriting for Retrieval-Augmented Large Language Models,显着提高了长尾查询的召回率,从而带动了GMV的上升。
When responding to queries, LLM constructs hypothetical documents (assumed answers) instead of directly searching the query and its computed vectors in the vector database. It focuses on embedding similarity from answer to answer rather than seeking embedding similarity for the problem or query. In addition, it also includes Reverse HyDE, which focuses on retrieval from query to query.
响应查询时,LLM 构造假设文档(假设答案),而不是直接在向量数据库中搜索查询及其计算向量。它侧重于嵌入答案之间的相似性,而不是寻求问题或查询的嵌入相似性。此外,它还包括Reverse HyDE,它专注于从查询到查询的检索。
The core idea of bothHyDE and Reverse HyDE is to bridge the map between query and answer.
HyDE 和 Reverse HyDE 的核心思想都是在查询和答案之间架起映射的桥梁。
Advanced RAG — Improving retrieval using Hypothetical Document Embeddings(HyDE)
高级 RAG — 使用假设文档嵌入 (HyDE) 改进检索
链接:https://medium.aiplanet.com/advanced-rag-improving-retrieval-using-hypothetical-document-embeddings-hyde-1421a8ec075a
Using the Step-back Prompting method proposed by Google DeepMind, the original query is abstracted to generate a high-level concept question (step-back question). In the RAG system, both the step-back question and the original query are used for retrieval, and both the results are utilized as the basis for language model answer generation.
使用Google DeepMind提出的Step-back Prompting方法,对原始查询进行抽象,生成高级概念问题(step-back Question)。在RAG系统中,后退问题和原始查询都用于检索,并且这两种结果都被用作语言模型答案生成的基础。
A New Prompt Engineering Technique Has Been Introduced Called Step-Back Prompting
引入了一种新的提示工程技术,称为“后退提示”
链接:https://cobusgreyling.medium.com/a-new-prompt-engineering-technique-has-been-introduced-called-step-back-prompting-b00e8954cacb
Based on varying queries, routing to distinct RAG pipeline,which is suitable for a versatile RAG system designed to accommodate diverse scenarios.
基于不同的查询,路由到不同的RAG管道,适用于旨在适应不同场景的多功能RAG系统。
The first step involves extracting keywords (entity) from the query, followed by filtering based on the keywords and metadata within the chunks to narrow down the search scope.
第一步涉及从查询中提取关键字(实体),然后根据块中的关键字和元数据进行过滤以缩小搜索范围。
Another method of routing involves leveraging the semantic information of the query. Specific apporch see Semantic Router.Certainly, a hybrid routing approach can also be employed, combining both semantic and metadata-based methods for enhanced query routing.
另一种路由方法涉及利用查询的语义信息。具体方法参见语义路由器。当然,也可以采用混合路由方法,结合语义和基于元数据的方法来增强查询路由。
Check Semantic router repo.
检查语义路由器存储库。
链接:https://github.com/aurelio-labs/semantic-router/
Beyond Basic Chatbots: How Semantic Router is Changing the Game
超越基本聊天机器人:语义路由器如何改变游戏规则
链接:https://medium.com/ai-insights-cobet/beyond-basic-chatbots-how-semantic-router-is-changing-the-game-783dd959a32d
Converting a user’s query into another query language for accessing alternative data sources. Common methods include:
将用户的查询转换为另一种查询语言以访问替代数据源。常见的方法包括:
In many scenarios, structured query languages (e.g., SQL, Cypher) are often used in conjunction with semantic information and metadata to construct more complex queries. For specific details, please refer to the Langchain blog.
在许多场景中,结构化查询语言(例如SQL、Cypher)通常与语义信息和元数据结合使用来构造更复杂的查询。具体详情请参考Langchain博客。
Query Construction 查询构造
链接:https://blog.langchain.dev/query-construction/
The retrieval process plays a crucial role in RAG. Leveraging powerful PLMs enables the effective representation of queries and text in latent spaces, facilitating the establishment of semantic similarity between questions and documents to support retrieval.
检索过程在 RAG 中起着至关重要的作用。利用强大的 PLM 可以有效表示潜在空间中的查询和文本,从而促进问题和文档之间建立语义相似性以支持检索。
Three main considerations need to be taken into account :
需要考虑三个主要因素:
Since the release of ChatGPT, there has been a frenzy of development in embedding models.Hugging Face’s MTEB leaderboard evaluates nearly all available embedding models across 8 tasks — Clustering,Classification,Bitext Ming, Pair Classification, Reranking, Retrieval, Semantic Text Similarity (STS), and Summarization, covering 58 dataset Additionally, C-MTEB focuses on evaluating the capabilities of Chinese embedding models, covering 6 tasks and 35 datasets.
自 ChatGPT 发布以来,嵌入模型出现了疯狂的发展。Hugging Face 的 MTEB 排行榜评估了 8 个任务中几乎所有可用的嵌入模型——聚类、分类、Bittext Ming、Pair Classification、Reranking、Retrieval、语义文本相似度 (STS) )和摘要,覆盖58个数据集此外,C-MTEB专注于评估中国嵌入模型的能力,覆盖6个任务和35个数据集。
When constructing RAG applications, there is no one-size-fits-all answer to “which embedding model to use.” However, you may notice that specific embeddings are better suited for particular use cases.
在构建 RAG 应用程序时,对于“使用哪种嵌入模型”没有一刀切的答案。但是,您可能会注意到特定的嵌入更适合特定的用例。
Check the MTEB/C-MTEB Leaderboard.
查看 MTEB/C-MTEB 排行榜。
MTEB Leaderboard - a Hugging Face Space by mteb
MTEB 排行榜 - mteb 打造的拥抱空间
链接:https://huggingface.co/spaces/mteb/leaderboard
While sparse encoding models may be considered a somewhat antiquated technique, often based on statistical methods such as word frequency statistics, they still hold a certain place due to their higher encoding efficiency and stability. Common coefficient encoding models include BM25 and TF-IDF.
虽然稀疏编码模型可能被认为是一种有些过时的技术,通常基于词频统计等统计方法,但由于其更高的编码效率和稳定性,它们仍然占有一定的地位。常见的系数编码模型包括BM25和TF-IDF。
Neural network-based dense encoding models encompass several types:
基于神经网络的密集编码模型包括以下几种类型:
Two embedding approaches capture different relevance features and can benefit from each other by leveraging complementary relevance information. For instance, sparse retrieval models can be used to provide initial search results for training dense retrieval models. Additionally, PLMs can be utilized to learn term weights to enhance sparse retrieval. Specifically, it also demonstrates that sparse retrieval models can enhance the zero-shot retrieval capability of dense retrieval models and assist dense retrievers in handling queries containing rare entities, thereby improving robustness.
两种嵌入方法捕获不同的相关性特征,并且可以通过利用互补的相关性信息来相互受益。例如,稀疏检索模型可用于为训练密集检索模型提供初始搜索结果。此外,PLM 可用于学习术语权重以增强稀疏检索。具体来说,它还表明稀疏检索模型可以增强密集检索模型的零样本检索能力,并协助密集检索器处理包含稀有实体的查询,从而提高鲁棒性。
Image from IVAN ILIN:Advanced RAG Techniques: an Illustrated Overview
图片来自 IVAN ILIN:高级 RAG 技术:图解概述
In cases where the context may diverge from what the pre-trained model deems similar in the embedding space, particularly in highly specialized fields like healthcare, law, and other domains abundant in proprietary terminology, adjusting the embedding model can address this issue. While this adjustment demands additional effort, it can substantially enhance retrieval efficiency and domain alignment.
如果上下文可能与预训练模型在嵌入空间中认为相似的内容不同,特别是在医疗保健、法律和其他拥有丰富专有术语的领域等高度专业化的领域,调整嵌入模型可以解决此问题。虽然这种调整需要额外的努力,但它可以显着提高检索效率和域对齐。
You can construct your own fine-tuning dataset based on domain-specific data, a task that can be swiftly accomplished using LlamaIndex.
您可以根据特定领域的数据构建自己的微调数据集,这一任务可以使用 LlamaIndex 快速完成。
In contrast to directly constructing a fine-tuning dataset from the dataset, LSR utilizes the LM-generated results as supervisory signals to fine-tune the embedding model during the RAG process.
与直接从数据集构建微调数据集相比,LSR 利用 LM 生成的结果作为监督信号,在 RAG 过程中微调嵌入模型。
Inspired by RLHF(Reinforcement Learning fromHuman Feedback), utilizing LM-based feedback to reinforce the Retriever through reinforcement learning.
受到 RLHF(Reinforcement Learning from Human Feedback)的启发,利用基于 LM 的反馈通过强化学习来强化 Retriever。
At times, fine-tuning an entire retriever can be costly, especially when dealing with API-based retrievers that cannot be directly fine-tuned. In such cases, we can mitigate this by incorporating an adapter module and conducting fine-tuning.Another benefit of adding an adapter is the ability to achieve better alignment with specific downstream tasks.
有时,微调整个检索器的成本可能很高,尤其是在处理无法直接微调的基于 API 的检索器时。在这种情况下,我们可以通过合并适配器模块并进行微调来缓解这种情况。添加适配器的另一个好处是能够更好地与特定下游任务保持一致。
Retrieving entire document chunks and feeding them directly into the LLM’s contextual environment is not an optimal choice. Post-processing the documents can aid LLM in better leveraging the contextual information.
检索整个文档块并将其直接输入到 LLM 的上下文环境中并不是最佳选择。对文档进行后处理可以帮助LLM更好地利用上下文信息。
The primary challenges include:
主要挑战包括:
Rerank the retrieved document chunks without altering their content or length, to enhance the visibility of the more crucial document chunks for LLM. In specific terms:
在不改变内容或长度的情况下对检索到的文档块进行重新排序,以增强 LLM 更重要的文档块的可见性。具体而言:
According to certain rules, metrics are calculated to rerank chunks. Common metrics include:
根据某些规则,计算指标以对块进行重新排序。常见指标包括:
The idea behind MMR is to reduce redundancy and increase result diversity, and it is used for text summarization. MMR selects phrases in the final key phrase list based on a combined criterion of query relevance and information novelty.
MMR 背后的想法是减少冗余并增加结果多样性,它用于文本摘要。MMR 根据查询相关性和信息新颖性的组合标准在最终关键短语列表中选择短语。
Check there rerank implementation in HayStack
检查 HayStack 中是否有 rerank 实现
Enhancing RAG Pipelines in Haystack: Introducing DiversityRanker and LostInTheMiddleRanker
增强 Haystack 中的 RAG 管道:引入 DiversityRanker 和 LostInTheMiddleRanker
链接:https://towardsdatascience.com/enhancing-rag-pipelines-in-haystack-45f14e2bc9f5
Utilize a language model to reorder the document chunks, with options including:
利用语言模型对文档块重新排序,选项包括:
A common misconception in the RAG process is the belief that retrieving as many relevant documents as possible and concatenating them to form a lengthy retrieval prompt is beneficial. However, excessive context can introduce more noise, diminishing the LLM’s perception of key information and leading to issues such as “ lost in the middle” . A common approach to address this is to compress and select the retrieved content.
RAG 过程中的一个常见误解是认为检索尽可能多的相关文档并将它们连接起来形成冗长的检索提示是有益的。然而,过多的上下文会引入更多的噪音,削弱LLM对关键信息的感知,并导致“迷失在中间”等问题。解决此问题的常见方法是压缩并选择检索到的内容。
By utilizing aligned and trained small language models, such as GPT-2 Small or LLaMA-7B, the detection and removal of unimportant tokens from the prompt is achieved, transforming it into a form that is challenging for humans to comprehend but well understood by LLMs. This approach presents a direct and practical method for prompt compression, eliminating the need for additional training of LLMs while balancing language integrity and compression ratio.
通过利用经过对齐和训练的小语言模型,例如 GPT-2 Small 或 LLaMA-7B,可以实现从提示中检测和删除不重要的标记,将其转换为人类难以理解但易于理解的形式b1001>。这种方法提供了一种直接实用的即时压缩方法,无需额外训练LLMs,同时平衡语言完整性和压缩率。
check the LLMLingua project.
检查 LLMLingua 项目。
LLMLingua | Explore the special language for LLMs via Prompt Compression
LLM语言 |通过提示压缩探索 LLMs 的特殊语言
链接:https://wyydsb.xin/NLP/LLMLingua_en.html
Recomp introduces two types of compressors: an extractive compressor that selects pertinent sentences from retrieved documents, and an abstractive compressor that produces concise summaries by amalgamating information from multiple documents. Both compressors are trained to enhance the performance of language models on end tasks when the generated summaries are prepended to the language models’ input, while ensuring the conciseness of the summary. In cases where the retrieved documents are irrelevant to the input or do not provide additional information to the language model, compressor can return an empty string, thereby implementing selective augmentation.
Recomp 引入了两种类型的压缩器:提取压缩器从检索到的文档中选择相关句子,抽象压缩器通过合并多个文档中的信息来生成简洁的摘要。当生成的摘要被添加到语言模型的输入之前时,这两个压缩器都经过训练,以提高语言模型在最终任务上的性能,同时确保摘要的简洁性。在检索到的文档与输入无关或不向语言模型提供附加信息的情况下,压缩器可以返回空字符串,从而实现选择性增强。
By identifying and removing redundant content in the input context, the input can be streamlined, thus improving the language model’s reasoning efficiency. Selective Context is akin to a “stop-word removal” strategy. In practice, selective context assesses the information content of lexical units based on the self-information computed by the base language model. By retaining content with higher self-information, this method offers a more concise and efficient textual representation for language model processing, without compromising their performance across diverse applications. However, it overlooks the interdependence between compressed content and the alignment between the targeted language model and the small language model utilized for prompting compression.
通过识别并去除输入上下文中的冗余内容,可以简化输入,从而提高语言模型的推理效率。选择性上下文类似于“停用词删除”策略。在实践中,选择性上下文根据基本语言模型计算的自信息来评估词汇单元的信息内容。通过保留具有更高自信息的内容,该方法为语言模型处理提供了更简洁、更高效的文本表示,而不会影响其在不同应用程序中的性能。然而,它忽略了压缩内容之间的相互依赖性以及目标语言模型和用于提示压缩的小语言模型之间的一致性。
Tagging is a relatively intuitive and straightforward approach. Specifically, the documents are first labeled, and then filtered based on the metadata of the query.
标记是一种相对直观且直接的方法。具体来说,首先对文档进行标记,然后根据查询的元数据进行过滤。
Tagging | ?️? Langchain
标签 | ?️?朗链
链接:https://python.langchain.com/docs/use_cases/tagging
Another straightforward and effective approach involves having the LLM evaluate the retrieved content before generating the final answer. This allows the LLM to filter out documents with poor relevance through LLM critique. For instance, in Chatlaw, the LLM is prompted to self-suggestion on the referenced legal provisions to assess their relevance.
另一种简单有效的方法是让 LLM 在生成最终答案之前评估检索到的内容。这允许LLM通过LLM批评过滤掉相关性较差的文档。例如,在 Chatlaw 中,LLM 被提示对引用的法律条款进行自我暗示,以评估其相关性。
Utilize the LLM to generate answers based on the user’s query and the retrieved context information.
利用 LLM 根据用户的查询和检索到的上下文信息生成答案。
Depending on the scenario, the choice of LLM can be categorized into the following two types:
根据场景的不同,LLM的选择可以分为以下两种:
Cloud API-based Utilize third-party LLMs by invoking their APIs, such as OpenAI’s ChatGPT, GPT-4, and Anthropic Claude, among others. Benefits:
基于云API 通过调用第三方API来利用第三方LLMs,例如OpenAI的ChatGPT、GPT-4和Anthropic Claude等。好处:
Drawbacks:缺点:
Locally deployed open-source or self-developed LLMs, such as the Llama series, GLM, and others.The advantages and disadvantages are opposite to those of Cloud API-based models. Locally deployed models offer greater flexibility and better privacy protection but require higher computational resources.
本地部署的开源或自研的LLMs,例如Llama系列、GLM等。优点和缺点与基于Cloud API的模型相反。本地部署的模型提供了更大的灵活性和更好的隐私保护,但需要更高的计算资源。
In addition to directl LLM usage, targeted fine-tuning based on the scenario and data characteristics can yield better results. This is also one of the greatest advantages of using an on-premise setup. Common fine-tuning methods include the following:
除了直接使用LLM之外,根据场景和数据特征进行有针对性的微调可以收到更好的效果。这也是使用本地设置的最大优势之一。常见的微调方法有以下几种:
When LLMs lack data in a specific domain, additional knowledge can be provided to the LLM through fine-tuning. Huggingface’s fine-tuning data can also be used as an initial step.
当LLMs缺乏特定领域的数据时,可以通过微调为LLM提供额外的知识。Huggingface 的微调数据也可以用作初始步骤。
Another benefit of fine-tuning is the ability to adjust the model’s input and output. For example, it can enable LLM to adapt to specific data formats and generate responses in a particular style as instructed.
微调的另一个好处是能够调整模型的输入和输出。例如,它可以使LLM适应特定的数据格式并按照指示生成特定样式的响应。
Aligning LLM outputs with human or retriever preferences through reinforcement learning is a potential approach. For instance, manually annotating the final generated answers and then providing feedback through reinforcement learning. In addition to aligning with human preferences, it is also possible to align with the preferences of fine-tuned models and retrievers.
通过强化学习将 LLM 输出与人类或检索器偏好保持一致是一种潜在的方法。例如,手动注释最终生成的答案,然后通过强化学习提供反馈。除了符合人类偏好之外,还可以符合微调模型和检索器的偏好。
When circumstances prevent access to powerful proprietary models or larger parameter open-source models, a simple and effective method is to distill the more powerful models(e.g. GPT-4).
当情况无法访问强大的专有模型或更大参数的开源模型时,一个简单而有效的方法是提取更强大的模型(例如 GPT-4)。
Fine-tuning both Generator and Retriever to align their preferences. A typical approach, such as RA-DIT, aligns the scoring functions between Retriever and Generator using KL divergence.
微调生成器和检索器以调整它们的偏好。典型的方法(例如 RA-DIT)使用 KL 散度来调整检索器和生成器之间的评分函数。
Orchestration refers to the modules used to control the RAG process. RAG no longer follows a fixed process, and it involves making decisions at key points and dynamically selecting the next step based on the results. This is also one of the key features of modularized RAG compared to Naive RAG.
编排是指用于控制 RAG 流程的模块。RAG不再遵循固定的流程,而是在关键点做出决策,并根据结果动态选择下一步。这也是模块化 RAG 相比 Naive RAG 的关键特性之一。
The Judge module assesses critical point in the RAG process, determining the need to retrieve external document repositories, the satisfaction of the answer, and the necessity of further exploration. It is typically used in recursive, iterative, and adaptive retrieval. Specifically, it mainly includes the following two operators:
Judge模块评估RAG过程中的关键点,确定是否需要检索外部文档存储库、答案的满意度以及进一步探索的必要性。它通常用于递归、迭代和自适应检索。具体来说,主要包括以下两个运算符:
The next course of action is determined based on predefined rules. Typically, the generated answers are scored, and then the decision to continue or stop is made based on whether the scores meet predefined thresholds. Common thresholds include confidence levels for tokens.
下一步行动是根据预定义的规则确定的。通常,会对生成的答案进行评分,然后根据分数是否满足预定义阈值来决定继续或停止。常见阈值包括令牌的置信水平。
LLM autonomously determines the next course of action. There are primarily two approaches to achieve this. The first involves prompting LLM to reflect or make judgments based on the conversation history, as seen in the ReACT framework. The benefit here is the elimination of the need for fine-tuning the model. However, the output format of the judgment depends on the LLM’s adherence to instructions. A prompt-base case is FLARE.
LLM自主决定下一步行动。主要有两种方法可以实现这一目标。第一个涉及提示 LLM 根据对话历史记录进行反思或做出判断,如 ReACT 框架中所示。这样做的好处是无需微调模型。然而,判断的输出格式取决于LLM对指令的遵守情况。一个提示基础案例是 FLARE。
The second approach entails LLM generating specific tokens to trigger particular actions, a method that can be traced back to Toolformer and is applied in RAG, such as in Self-RAG.
第二种方法需要LLM生成特定的令牌来触发特定的操作,这种方法可以追溯到Toolformer并应用于RAG中,例如Self-RAG中。
This concept originates from RAG Fusion. As mentioned in the previous section on Query Expansion, the current RAG process is no longer a singular pipeline. It often requires the expansion of retrieval scope or diversity through multiple branches. Therefore, following the expansion to multiple branches, the Fusion module is relied upon to merge multiple answers.
这个概念源自RAG Fusion。正如上一节查询扩展中提到的,当前的 RAG 进程不再是单一管道。它通常需要通过多个分支来扩展检索范围或多样性。因此,在扩展到多个分支后,需要依靠Fusion模块来合并多个答案。
The fusion method is based on the weighted values of different tokens generated from multiple beranches, leading to the comprehensive selection of the final output. Weighted averaging is predominantly employed. See REPLUG.
融合方法是根据多个分支生成的不同token的权重值,综合选择最终的输出。主要采用加权平均。请参阅重新插入。
RRF, is a technique that combines the rankings of multiple search result lists to generate a single unified ranking. Developed in collaboration with the University of Waterloo (CAN) and Google, RRF produces results that are more effective than reordering chunks under any single branch.
RRF,是一种将多个搜索结果列表的排名组合起来生成单个统一排名的技术。RRF 与滑铁卢大学 (CAN) 和 Google 合作开发,产生的结果比在任何单个分支下重新排序块更有效。
Forget RAG, the Future is RAG-Fusion
忘记 RAG,未来是 RAG-Fusion
链接:https://towardsdatascience.com/forget-rag-the-future-is-rag-fusion-1147298d8ad1
Advanced RAG Techniques: an Illustrated Overview
高级 RAG 技术:图解概述
链接:https://pub.towardsai.net/advanced-rag-techniques-an-illustrated-overview-04d193d8fec6
Conclusion结论
The upcoming content on RAG Flow will be introduced in PART II, to be published soon.
RAG Flow 即将发布的内容将在 PART II 中介绍,即将发布。
As this is my first time publishing an article on Medium, I am still getting familiar with many features. Any feedback and criticism are welcome.
53AI,企业落地应用大模型首选服务商
产品:大模型应用平台+智能体定制开发+落地咨询服务
承诺:先做场景POC验证,看到效果再签署服务协议。零风险落地应用大模型,已交付160+中大型企业
2024-03-30
2024-04-26
2024-05-10
2024-04-12
2024-05-28
2024-04-25
2024-05-14
2024-07-18
2024-08-13
2024-04-26