我要投稿

RAG优化策略：语义切块

发布日期：2024-04-30 19:36:14 浏览次数： 3072

作者：机器AI学习数据AI挖掘

微信搜一搜，关注“机器AI学习数据AI挖掘”

分块（Chunking）是将文本分解成更小的、有意义的单元或块的过程，以便更好地理解和处理文本。

RAG（Retrieval Augmented Generation）是一种深度学习模型，用于序列到序列的任务，如机器翻译、文本摘要等。在RAG中，我们将一系列文档/文档块编码为称为向量嵌入的数字表示形式，其中单个向量嵌入表示单个文档块，并将它们存储在名为向量存储的数据库中。将这些文档块编码为嵌入所需的模型称为编码模型或双编码器。这些编码器在大量数据上进行训练，因此能够将文档块编码为单个向量嵌入表示。

检索在很大程度上取决于块在向量存储中的表现形式和存储方式。对于给定的文本，找到合适的块大小通常是一个非常困难的问题。

通过各种检索方法可以改进检索，也可以通过更好的分块策略来实现。

不同的分块方法：

1、固定大小分块 2、递归分块 3、文档特定分块 4、语义分块 5、代理分块固定大小分块：这是最常见的直接分块方法：我们只需决定块中的标记数量，以及是否应该在它们之间有重叠。一般来说，我们希望在不同的块之间保持一定的重叠，以确保语义上下文不会丢失。在大多数情况下，固定大小的分块将是最佳路径。与其他形式的分块相比，固定大小的分块计算成本较低，使用起来也简单，因为它不需要使用任何NLP库。

递归分块：递归分块使用一组分隔符以分层和迭代的方式将输入文本划分为较小的块。如果初次尝试分割文本没有产生所需大小或结构的块，该方法会递归地对结果块使用不同的分隔符或标准调用自身，直到达到所需的块大小或结构。这意味着虽然块的大小不会完全相同，但它们仍然“渴望”具有相似的大小。它利用了固定大小分块和重叠的优点。

文档特定分块：它考虑了文档的结构。与使用一定数量的字符或递归过程不同，它创建与文档的逻辑部分对齐的块，如段落或子部分。通过这样做，它保持了内容的作者组织，从而使文本连贯。它使检索到的信息更具相关性和实用性，特别是对于具有明确定义部分的结构化文档。它可以处理Markdown、Html等格式。

语义分块：语义分块考虑文本之间的关系。它将文本划分为有意义的、语义完整的块。这种方法确保了检索过程中信息的完整性，从而产生了更准确和更符合上下文的结果。它比之前的分块策略要慢。

代理分块：这里的假设是以一种人类处理文档的方式进行处理。我们从文档的顶部开始，将第一部分视为一个块。我们继续向下阅读文档，决定新的句子或信息是否应与第一个块一起保留，还是应该开始一个新的块。我们一直这样做，直到到达文档的末尾。这种方法仍在测试中，由于处理多个LLM调用的时间和这些调用的成本，它还不适合大规模应用。目前还没有公共库提供实现。

在这里，我们将实验使用语义分块和递归检索器。

方法步骤比较：加载文档使用以下两种方法对文档进行分块：语义分块和递归检索器。使用RAGAS评估定性和定量改进。语义块语义分块涉及获取文档中每个句子的嵌入，比较所有句子之间的相似性，然后将具有最相似嵌入的句子分组在一起。

通过关注文本的意义和上下文，语义分块显著提高了检索的质量。当保持文本的语义完整性至关重要时，它是一种顶级选择。

这里的假设是我们可以使用单个句子的嵌入来制作更有意义的块。基本思路如下：-

根据分隔符（.,?,!）将文档拆分为句子。根据位置对每个句子进行索引。分组：选择要在每侧有多少个句子。在我们的选定句子周围添加一定数量的句子缓冲区。计算一组句子之间的距离。根据相似性合并组，即保持相似的语句在一起。拆分不相似的语句。

技术栈使用Langchain：LangChain是一个开源框架，旨在简化使用大型语言模型（LLMs）创建应用程序的过程。它为链提供了标准接口，与许多其他工具的集成，以及常见应用程序的端到端链。LLM：Groq的语言处理单元（LPU）是一种尖端技术，旨在显著提高AI计算性能，特别是对于大型语言模型（LLMs）。Groq LPU系统的主要目标是提供实时、低延迟的体验，并具有卓越的推理性能。嵌入模型：FastEmbed是一个轻量级、快速的Python库，用于生成嵌入。评估：Ragas提供了专门针对您的RAG管道每个组件进行单独评估的指标。代码实现安装所需的依赖项

!pip install -qU langchain_experimental langchain_openai langchain_community langchain ragas chromadb langchain-groq fastembed pypdf openai

langchain==0.1.16
langchain-community==0.0.34
langchain-core==0.1.45
langchain-experimental==0.0.57
langchain-groq==0.1.2
langchain-openai==0.1.3
langchain-text-splitters==0.0.1
langcodes==3.3.0
langsmith==0.1.49
chromadb==0.4.24
ragas==0.1.7
fastembed==0.2.6

下载数据

! wget "https://arxiv.org/pdf/1810.04805.pdf"

pdf文档解析

from langchain.document_loaders import PyPDFLoaderfrom langchain.text_splitter import RecursiveCharacterTextSplitter#loader = PyPDFLoader("1810.04805.pdf")documents = loader.load()#print(len(documents))

执行原生分块（递归字符文本拆分）

from langchain.text_splitter import RecursiveCharacterTextSplitter
text_splitter = RecursiveCharacterTextSplitter(    chunk_size=1000,    chunk_overlap=0,    length_function=len,    is_separator_regex=False)#naive_chunks = text_splitter.split_documents(documents)for chunk in naive_chunks[10:15]:  print(chunk.page_content+ "\n")
###########################RESPONSE###############################BERT BERT E[CLS] E1 E[SEP] ... ENE1’... EM’CT1T[SEP] ... TNT1’... TM’[CLS] Tok 1 [SEP] ... Tok NTok 1 ... TokM Question Paragraph Start/End Span BERT E[CLS] E1 E[SEP] ... ENE1’... EM’CT1T[SEP] ... TNT1’... TM’[CLS] Tok 1 [SEP] ... Tok NTok 1 ... TokM Masked Sentence A Masked Sentence B Pre-training Fine-Tuning NSP Mask LM Mask LM Unlabeled Sentence A and B Pair SQuAD Question Answer Pair NER MNLI Figure 1: Overall pre-training and ﬁne-tuning procedures for BERT. Apart from output layers, the same architec-tures are used in both pre-training and ﬁne-tuning. The same pre-trained model parameters are used to initializemodels for different down-stream tasks. During ﬁne-tuning, all parameters are ﬁne-tuned. [CLS] is a specialsymbol added in front of every input example, and [SEP] is a special separator token (e.g. separating ques-tions/answers).ing and auto-encoder objectives have been usedfor pre-training such models (Howard and Ruder,
2018; Radford et al., 2018; Dai and Le, 2015).2.3 Transfer Learning from Supervised DataThere has also been work showing effective trans-fer from supervised tasks with large datasets, suchas natural language inference (Conneau et al.,2017) and machine translation (McCann et al.,2017). Computer vision research has also demon-strated the importance of transfer learning fromlarge pre-trained models, where an effective recipeis to ﬁne-tune models pre-trained with Ima-geNet (Deng et al., 2009; Yosinski et al., 2014).3 BERTWe introduce BERT and its detailed implementa-tion in this section. There are two steps in ourframework: pre-training and ﬁne-tuning . Dur-ing pre-training, the model is trained on unlabeleddata over different pre-training tasks. For ﬁne-tuning, the BERT model is ﬁrst initialized withthe pre-trained parameters, and all of the param-eters are ﬁne-tuned using labeled data from thedownstream tasks. Each downstream task has sep-
arate ﬁne-tuned models, even though they are ini-tialized with the same pre-trained parameters. Thequestion-answering example in Figure 1 will serveas a running example for this section.A distinctive feature of BERT is its uniﬁed ar-chitecture across different tasks. There is mini-mal difference between the pre-trained architec-ture and the ﬁnal downstream architecture.Model Architecture BERT’s model architec-ture is a multi-layer bidirectional Transformer en-coder based on the original implementation de-scribed in Vaswani et al. (2017) and released inthetensor2tensor library.1Because the useof Transformers has become common and our im-plementation is almost identical to the original,we will omit an exhaustive background descrip-tion of the model architecture and refer readers toVaswani et al. (2017) as well as excellent guidessuch as “The Annotated Transformer.”2In this work, we denote the number of layers(i.e., Transformer blocks) as L, the hidden size as
H, and the number of self-attention heads as A.3We primarily report results on two model sizes:BERT BASE (L=12, H=768, A=12, Total Param-eters=110M) and BERT LARGE (L=24, H=1024,A=16, Total Parameters=340M).BERT BASE was chosen to have the same modelsize as OpenAI GPT for comparison purposes.Critically, however, the BERT Transformer usesbidirectional self-attention, while the GPT Trans-former uses constrained self-attention where everytoken can only attend to context to its left.41https://github.com/tensorﬂow/tensor2tensor2http://nlp.seas.harvard.edu/2018/04/03/attention.html3In all cases we set the feed-forward/ﬁlter size to be 4H,i.e., 3072 for the H= 768 and 4096 for the H= 1024 .4We note that in the literature the bidirectional Trans-
Input/Output Representations To make BERThandle a variety of down-stream tasks, our inputrepresentation is able to unambiguously representboth a single sentence and a pair of sentences(e.g.,⟨Question, Answer⟩) in one token sequence.Throughout this work, a “sentence” can be an arbi-trary span of contiguous text, rather than an actuallinguistic sentence. A “sequence” refers to the in-put token sequence to BERT, which may be a sin-gle sentence or two sentences packed together.We use WordPiece embeddings (Wu et al.,2016) with a 30,000 token vocabulary. The ﬁrsttoken of every sequence is always a special clas-siﬁcation token ( [CLS] ). The ﬁnal hidden statecorresponding to this token is used as the ag-gregate sequence representation for classiﬁcationtasks. Sentence pairs are packed together into asingle sequence. We differentiate the sentences intwo ways. First, we separate them with a specialtoken ( [SEP] ). Second, we add a learned embed-Instantiate Embedding Model
from langchain_community.embeddings.fastembed import FastEmbedEmbeddingsembed_model = FastEmbedEmbeddings(model_name="BAAI/bge-base-en-v1.5")

设置LLM的API密钥

from google.colab import userdatafrom groq import Groqfrom langchain_groq import ChatGroq#groq_api_key = userdata.get("GROQ_API_KEY")

执行语义分块

我们今天将以percentile阈值为例 — 但在语义分块中，你可以选择三种不同的策略：

percentile（默认）— 在这种方法中，计算所有句子之间的差异，然后任何大于X百分位的差异都会被分割。
standard_deviation — 在这种方法中，任何大于X个标准差的差异都会被分割。
interquartile — 在这种方法中，使用四分位距来分割块。

注意：这种方法目前处于实验阶段，尚未达到稳定最终形态 — 预计在未来几个月内会有更新和改进。

from langchain_experimental.text_splitter import SemanticChunkerfrom langchain_openai.embeddings import OpenAIEmbeddings
semantic_chunker = SemanticChunker(embed_model, breakpoint_threshold_type="percentile")#semantic_chunks = semantic_chunker.create_documents([d.page_content for d in documents])#for semantic_chunk in semantic_chunks:  if "Effect of Pre-training Tasks" in semantic_chunk.page_content:    print(semantic_chunk.page_content)    print(len(semantic_chunk.page_content))
#############################RESPONSE###############################Dev SetTasks MNLI-m QNLI MRPC SST-2 SQuAD(Acc) (Acc) (Acc) (Acc) (F1)BERT BASE 84.4 88.4 86.7 92.7 88.5No NSP 83.9 84.9 86.5 92.6 87.9LTR & No NSP 82.1 84.3 77.5 92.1 77.8+ BiLSTM 82.1 84.1 75.7 91.6 84.9Table 5: Ablation over the pre-training tasks using theBERT BASE architecture. “No NSP” is trained withoutthe next sentence prediction task. “LTR & No NSP” istrained as a left-to-right LM without the next sentenceprediction, like OpenAI GPT. “+ BiLSTM” adds a ran-domly initialized BiLSTM on top of the “LTR + NoNSP” model during ﬁne-tuning. ablation studies can be found in Appendix C. 5.1 Effect of Pre-training TasksWe demonstrate the importance of the deep bidi-rectionality of BERT by evaluating two pre-training objectives using exactly the same pre-training data, ﬁne-tuning scheme, and hyperpa-rameters as BERT BASE :No NSP : A bidirectional model which is trainedusing the “masked LM” (MLM) but without the“next sentence prediction” (NSP) task. LTR & No NSP : A left-context-only model whichis trained using a standard Left-to-Right (LTR)LM,

实例化向量存储库

from langchain_community.vectorstores import Chromasemantic_chunk_vectorstore = Chroma.from_documents(semantic_chunks, embedding=embed_model)

我们将“限制”我们的语义检索器到k = 1，以展示语义分块策略的强大能力，同时保持语义和天真检索上下文之间的相似标记计数。

实例化检索步骤

semantic_chunk_retriever = semantic_chunk_vectorstore.as_retriever(search_kwargs={"k" : 1})semantic_chunk_retriever.invoke("Describe the Feature-based Approach with BERT?")
########################RESPONSE###################################[Document(page_content='The right part of the paper represents the\nDev set results. For the feature-based approach,\nwe concatenate the last 4 layers of BERT as the\nfeatures, which was shown to be the best approach\nin Section 5.3. From the table it can be seen that ﬁne-tuning is\nsurprisingly robust to different masking strategies. However, as expected, using only the M ASK strat-\negy was problematic when applying the feature-\nbased approach to NER. Interestingly, using only\nthe R NDstrategy performs much worse than our\nstrategy as well.')]

实例化增强步骤（用于内容增强）

from langchain_core.prompts import ChatPromptTemplate
rag_template = """\Use the following context to answer the user's query. If you cannot answer, please respond with 'I don't know'.
User's Query:{question}
Context:{context}"""
rag_prompt = ChatPromptTemplate.from_template(rag_template)

实例化生成步骤

chat_model = ChatGroq(temperature=0,                      model_name="mixtral-8x7b-32768",                      api_key=userdata.get("GROQ_API_KEY"),)

创建一个利用语义分块的RAG管道

from langchain_core.runnables import RunnablePassthroughfrom langchain_core.output_parsers import StrOutputParser
semantic_rag_chain = (    {"context" : semantic_chunk_retriever, "question" : RunnablePassthrough()}    | rag_prompt    | chat_model    | StrOutputParser())

问答1:

semantic_rag_chain.invoke("Describe the Feature-based Approach with BERT?")
################ RESPONSE ###################################The feature-based approach with BERT, as mentioned in the context, involves using BERT as a feature extractor for a downstream natural language processing task, specifically Named Entity Recognition (NER) in this case.
To use BERT in a feature-based approach, the last 4 layers of BERT are concatenated to serve as the features for the task. This was found to be the most effective approach in Section 5.3 of the paper.
The context also mentions that fine-tuning BERT is surprisingly robust to different masking strategies. However, when using the feature-based approach for NER, using only the MASK strategy was problematic. Additionally, using only the RND strategy performed much worse than the proposed strategy.
In summary, the feature-based approach with BERT involves using the last 4 layers of BERT as features for a downstream NLP task, and fine-tuning these features for the specific task. The approach was found to be robust to different masking strategies, but using only certain strategies was problematic for NER.

问答2:

semantic_rag_chain.invoke("What is SQuADv2.0?")################ RESPONSE ###################################SQuAD v2.0, or Squad Two Point Zero, is a version of the Stanford Question Answering Dataset (SQuAD) that extends the problem definition of SQuAD 1.1 by allowing for the possibility that no short answer exists in the provided paragraph. This makes the problem more realistic, as not all questions have a straightforward answer within the provided text. The SQuAD 2.0 task uses a simple approach to extend the SQuAD 1.1 BERT model for this task, by treating questions that do not have an answer as having an answer span with start and end at the [CLS] token, and comparing the score of the no-answer span to the score of the best non-null span for prediction. The document also mentions that the BERT ensemble, which is a combination of 7 different systems using different pre-training checkpoints and fine-tuning seeds, outperforms all existing systems by a wide margin in SQuAD 2.0, even when excluding entries that use BERT as one of their components.

问答3：

semantic_rag_chain.invoke("What is the purpose of Ablation Studies?")################ RESPONSE ###################################Ablation studies are used to understand the impact of different components or settings of a machine learning model on its performance. In the provided context, ablation studies are used to answer questions about the effect of the number of training steps and masking procedures on the performance of the BERT model. By comparing the performance of the model under different conditions, researchers can gain insights into the importance of these components or settings and how they contribute to the overall performance of the model.

使用天真分块策略实施一个RAG管道

naive_chunk_vectorstore = Chroma.from_documents(naive_chunks, embedding=embed_model)naive_chunk_retriever = naive_chunk_vectorstore.as_retriever(search_kwargs={"k" : 5})naive_rag_chain = (    {"context" : naive_chunk_retriever, "question" : RunnablePassthrough()}    | rag_prompt    | chat_model    | StrOutputParser())

注意：在这里，我们将使用k = 5；这是为了“使两种策略之间的比较公平”

问答1:

naive_rag_chain.invoke("Describe the Feature-based Approach with BERT?")
#############################RESPONSE##########################The Feature-based Approach with BERT involves extracting fixed features from the pre-trained BERT model, as opposed to the fine-tuning approach where all parameters are jointly fine-tuned on a downstream task. The feature-based approach has certain advantages, such as being applicable to tasks that cannot be easily represented by a Transformer encoder architecture, and providing major computational benefits by pre-computing an expensive representation of the training data once and then running many experiments with cheaper models on top of this representation. In the context provided, the feature-based approach is compared to the fine-tuning approach on the CoNLL-2003 Named Entity Recognition (NER) task, with the feature-based approach using a case-preserving WordPiece model and including the maximal document context provided by the data. The results presented in Table 7 show the performance of both approaches on the NER task.

问答2：

naive_rag_chain.invoke("What is SQuADv2.0?")#############################RESPONSE##########################SQuAD v2.0, or the Stanford Question Answering Dataset version 2.0, is a collection of question/answer pairs that extends the SQuAD v1.1 problem definition by allowing for the possibility that no short answer exists in the provided paragraph. This makes the problem more realistic. The SQuAD v2.0 BERT model is extended from the SQuAD v1.1 model by treating questions that do not have an answer as having an answer span with start and end at the [CLS] token, and extending the probability space for the start and end answer span positions to include the position of the [CLS] token. For prediction, the score of the no-answer span is compared to the score of the best non-null span.

问答3:

naive_rag_chain.invoke("What is the purpose of Ablation Studies?")
#############################RESPONSE##########################Ablation studies are used to evaluate the effect of different components or settings in a machine learning model. In the provided context, ablation studies are used to understand the impact of certain aspects of the BERT model, such as the number of training steps and masking procedures, on the model's performance.
For instance, one ablation study investigates the effect of the number of training steps on BERT's performance. The results show that BERT BASE achieves higher fine-tuning accuracy on MNLI when trained for 1M steps compared to 500k steps, indicating that a larger number of training steps contributes to better performance.
Another ablation study focuses on different masking procedures during pre-training. The study compares BERT's masked language model (MLM) with a left-to-right strategy. The results demonstrate that the masking strategies aim to reduce the mismatch between pre-training and fine-tuning, as the [MASK] symbol does not appear during the fine-tuning stage. The study also reports Dev set results for both MNLI and Named Entity Recognition (NER) tasks, considering fine-tuning and feature-based approaches for NER.

Ragas评估比较语义分块器

使用递归字符文本拆分器分割文档

synthetic_data_splitter = RecursiveCharacterTextSplitter(    chunk_size=1000,    chunk_overlap=0,    length_function=len,    is_separator_regex=False)#synthetic_data_chunks = synthetic_data_splitter.create_documents([d.page_content for d in documents])print(len(synthetic_data_chunks))

创建以下数据集

1、问题 — 合成生成的（grogq-mixtral-8x7b-32768） 2、上下文 — 上述创建的（合成数据块） 3、基准真相 — 合成生成的（grogq-mixtral-8x7b-32768） 4、答案 — 从我们的语义RAG链生成的

questions = []ground_truths_semantic = []contexts = []answers = []
question_prompt = """\You are a teacher preparing a test. Please create a question that can be answered by referencing the following context.
Context:{context}"""
question_prompt = ChatPromptTemplate.from_template(question_prompt)
ground_truth_prompt = """\Use the following context and question to answer this question using *only* the provided context.
Question:{question}
Context:{context}"""
ground_truth_prompt = ChatPromptTemplate.from_template(ground_truth_prompt)
question_chain = question_prompt | chat_model | StrOutputParser()ground_truth_chain = ground_truth_prompt | chat_model | StrOutputParser()
for chunk in synthetic_data_chunks[10:20]:  questions.append(question_chain.invoke({"context" : chunk.page_content}))  contexts.append([chunk.page_content])  ground_truths_semantic.append(ground_truth_chain.invoke({"question" : questions[-1], "context" : contexts[-1]}))  answers.append(semantic_rag_chain.invoke(questions[-1]))

注意：出于实验目的，我们只考虑了10个样本

将生成的内容格式化为HuggingFace数据集格式。

from datasets import load_dataset, Dataset
qagc_list = []
for question, answer, context, ground_truth in zip(questions, answers, contexts, ground_truths_semantic):  qagc_list.append({      "question" : question,      "answer" : answer,      "contexts" : context,      "ground_truth" : ground_truth  })
eval_dataset = Dataset.from_list(qagc_list)eval_dataset
###########################RESPONSE###########################Dataset({    features: ['question', 'answer', 'contexts', 'ground_truth'],    num_rows: 10})

实施Ragas指标并评估我们创建的数据集。

from ragas.metrics import (    answer_relevancy,    faithfulness,    context_recall,    context_precision,)
#from ragas import evaluate
result = evaluate(    eval_dataset,    metrics=[        context_precision,        faithfulness,        answer_relevancy,        context_recall,    ],     llm=chat_model,     embeddings=embed_model,    raise_exceptions=False)

在这里，我尝试使用Groq的开源LLM。但是得到了一个速率限制错误：

groq.RateLimitError: Error code: 429 - {'error': {'message': 'Rate limit reached for model `mixtral-8x7b-32768` in organization `org_01htsyxttnebyt0av6tmfn1fy6` on tokens per minute (TPM): Limit 4500, Used 3867, Requested ~1679. Please try again in 13.940333333s. Visit https://console.groq.com/docs/rate-limits for more information.', 'type': 'tokens', 'code': 'rate_limit_exceeded'}}

所以将LLM重定向为使用OpenAI，它在RAGAS框架中默认使用。

设置OpenAI API密钥。

import osfrom google.colab import userdataimport openaios.environ['OPENAI_API_KEY'] = userdata.get('OPENAI_API_KEY')openai.api_key = os.environ['OPENAI_API_KEY']

from ragas import evaluate
result = evaluate(    eval_dataset,    metrics=[        context_precision,        faithfulness,        answer_relevancy,        context_recall,    ],)result
#########################RESPONSE##########################{'context_precision': 1.0000, 'faithfulness': 0.8857, 'answer_relevancy': 0.9172, 'context_recall': 1.0000}

#Extract the details into a dataframeresults_df = result.to_pandas()results_df

Ragas评估比较天真分块器

import tqdmquestions = []ground_truths_semantic = []contexts = []answers = []for chunk in tqdm.tqdm(synthetic_data_chunks[10:20]):  questions.append(question_chain.invoke({"context" : chunk.page_content}))  contexts.append([chunk.page_content])  ground_truths_semantic.append(ground_truth_chain.invoke({"question" : questions[-1], "context" : contexts[-1]}))  answers.append(naive_rag_chain.invoke(questions[-1]))

制定原生分块评估数据集

qagc_list = []
for question, answer, context, ground_truth in zip(questions, answers, contexts, ground_truths_semantic):  qagc_list.append({      "question" : question,      "answer" : answer,      "contexts" : context,      "ground_truth" : ground_truth  })
naive_eval_dataset = Dataset.from_list(qagc_list)naive_eval_dataset
############################RESPONSE########################Dataset({    features: ['question', 'answer', 'contexts', 'ground_truth'],    num_rows: 10})

使用RAGAS框架评估我们创建的数据集

naive_result = evaluate(    naive_eval_dataset,    metrics=[        context_precision,        faithfulness,        answer_relevancy,        context_recall,    ],)#naive_result############################RESPONSE#######################{'context_precision': 1.0000, 'faithfulness': 0.9500, 'answer_relevancy': 0.9182, 'context_recall': 1.0000}

naive_results_df = naive_result.to_pandas()naive_results_df
###############################RESPONSE #######################{'context_precision': 1.0000, 'faithfulness': 0.9500, 'answer_relevancy': 0.9182, 'context_recall': 1.0000}

结论在这里，我们可以看到语义分块和天真分块的结果几乎相同，除了在答案的事实表述方面，天真分块器以0.95的分数优于语义分块器的0.88分。

总之，语义分块能够实现对上下文相似信息的分组，允许创建独立且有意义的片段。这种方法通过为大型语言模型提供专注的输入，提高了其理解和处理自然语言数据的效率和效果。

53AI，企业落地大模型首选服务商

产品：场景落地咨询+大模型应用平台+行业解决方案

承诺：免费场景POC验证，效果验证后签署服务协议。零风险落地应用大模型，已交付160+中大型企业