微信扫码
与创始人交个朋友
我要投稿
知识图谱更新
第一阶段:新增一个事实(Fact)
比如我们想把下面这个句子加入到知识图谱中:
Brandon loves coffee
这是我们需要执行的步骤:
步骤 1:确定概念
[
[
{
"id": 1,
"text": "Brandon",
"lemma": "Brandon",
"upos": "PROPN",
"xpos": "NNP",
"feats": "Number=Sing",
"head": 2,
"deprel": "nsubj",
"start_char": 0,
"end_char": 7,
"ner": "S-PERSON",
"multi_ner": [
"S-PERSON"
]
},
{
"id": 2,
"text": "loves",
"lemma": "love",
"upos": "VERB",
"xpos": "VBZ",
"feats": "Mood=Ind|Number=Sing|Person=3|Tense=Pres|VerbForm=Fin",
"head": 0,
"deprel": "root",
"start_char": 8,
"end_char": 13,
"ner": "O",
"multi_ner": [
"O"
]
},
{
"id": 3,
"text": "coffee",
"lemma": "coffee",
"upos": "NOUN",
"xpos": "NN",
"feats": "Number=Sing",
"head": 2,
"deprel": "obj",
"start_char": 14,
"end_char": 20,
"ner": "O",
"multi_ner": [
"O"
],
"misc": "SpaceAfter=No"
}
]
]
步骤 2:寻找每个概念的相关概念
# distance is set to 1 by default
def fetch_neigbouring_concepts(concepts, distance):
concepts = sorted(concepts)
for i in range(len(concepts)):
concepts[i].related_concepts = []
for j in range(-distance, distance + 1, 1): # If distance from current concept is less than parameter distance
if i + j >= 0 and i + j < len(concepts): # If index is in bounds
if j == 0:
continue
if concepts[i].name < concepts[i+j].name: # Ensure that we only create one connection between nodes in Neo4J graph
concepts[i].related_concepts.append(concepts[i+j])
步骤 3:构建概念节点和它们的联系
可以采用图形数据库来储存这些概念及其之间的联系。完成这一步之后,知识图谱在 Neo4J 数据库中的样子大概会是这样:
红色的节点代表概念。它们通过一种称为“相关”(白色箭头)的关系相连接。蓝色的节点代表“全局时点”,在稍后的时间推理过程中会扮演一个重要的角色。它的值为 1,表示我们正首次建立并更新知识图谱。下面是用于构建图谱的 Cypher 查询代码:
MERGE (c00:Concept {name: 'brandon'})
MERGE (c01:Concept {name: 'coffe'})
WITH c00, c01
MERGE (c00)-[rc00c01:RELATED]->(c01)
WITH c00, c01, rc00c01, COALESCE(rc00c01.strength, 0) + 1 AS rc00c01ic
SET c00.t_index = 1, c01.t_index = 1
SET rc00c01.strength = rc00c01ic
SET rc00c01.t_index = 1
步骤 4: 添加概念的上下文信息
概念的上下文是指提及该概念的句子。因此,对于每个概念,我们首先从知识图谱中获取其上下文:
MATCH (n:Concept)
WHERE n.name IN ['brandon', 'coffe']
RETURN n.name, n.context, n.revision_count
MATCH (n:Concept)
WHERE n.name IN ['brandon', 'coffe']
WITH n,
CASE n.name
WHEN 'brandon' THEN '. Brandon loves coffee'
WHEN 'coffe' THEN '. Brandon loves coffee'
ELSE n.context
END AS newContext,
CASE n.name
WHEN 'brandon' THEN 0
WHEN 'coffe' THEN 0
ELSE 0
END AS revisionCount
SET n.context = newContext
SET n.revision_count = revisionCount
图2:t=1时以及添加概念上下文后的知识图
第二阶段:用另一事实来扩展现有事实
设想我们想要把以下这句话添加进知识图谱:
可以看出,这句话中的关键概念是“Brandon”和“Paris”。经过词干提取和转换为小写后,我们得到“brandon”和“pari”
图3:t=2时以及添加“brandon”和“pari”概念后的知识图
注意到全局时点已经更新至2,并且新的概念“pari”已经加入到了知识图谱。"brandon" 与 "pari" 之间建立了“RELATED”(相关)关系。由于在第2个时间步进行了更新,“brandon”的 t_index 也更新为了2。
MERGE (c00:Concept {name: 'brandon'})
MERGE (c01:Concept {name: 'pari'})
WITH c00, c01
MERGE (c00)-[rc00c01:RELATED]->(c01)
WITH c00, c01, rc00c01, COALESCE(rc00c01.strength, 0) + 1 AS rc00c01ic
SET c00.t_index = 2, c01.t_index = 2
SET rc00c01.strength = rc00c01ic
SET rc00c01.t_index = 2
但是概念节点的上下文是什么呢?和以前一样,从知识图中提取每个概念的上下文得到了以下内容:
{
"brandon": {
"context": ". Brandon loves coffee",
"revision_count": 0
},
"pari": {
"context": null,
"revision_count": null
}
}
不出所料,“brandon”有上下文,而“pari”没有。然后将当前上下文附加到从知识图中检索到的上下文中:
MATCH (n:Concept)
WHERE n.name IN ['brandon', 'pari']
WITH n,
CASE n.name
WHEN 'brandon' THEN '. Brandon loves coffee. Brandon wants to travel to Paris'
WHEN 'pari' THEN '. Brandon wants to travel to Paris'
ELSE n.context
END AS newContext,
CASE n.name
WHEN 'brandon' THEN 1
WHEN 'pari' THEN 0
ELSE 0
END AS revisionCount
SET n.context = newContext
SET n.revision_count = revisionCount
注意,“brandon”获得了一个新的上下文环境,这个环境不但包含了之前的上下文还包括了新的句子。因为其上下文进行了更新,它的修订计数(revision_count)也更新为1了。在这一步骤完成后,知识图谱将呈现如下的样子:
图4:t=2时以及添加概念上下文后的知识图谱
MATCH (startNode:Concept{name: 'pari'})
CALL apoc.path.spanningTree(startNode, {relationshipFilter: "", minLevel: 0, maxLevel: 2}) YIELD path
WITH path, nodes(path) as pathNodes, startNode.t_index as current_t
UNWIND range(0, size(pathNodes)-1) AS index
WITH path, pathNodes[index] as node, current_t
ORDER BY node.t_index DESC
WHERE node.t_index <= current_t AND node.t_index >= current_t - 15
WITH DISTINCT node LIMIT 800
MATCH ()-[relation]->()
RETURN node, relation
上面的查询从“pari”概念节点开始查找深度高达2级的所有路径,根据其t_index筛选结果节点,以忽略相对于“pari“早于15个时间点的概念,获取前800个唯一节点,并返回图中的这些节点和关系。获得以下概念节点:
{
"pari": ". Brandon wants to travel to Paris",
"brandon": ". Brandon loves coffee. Brandon wants to travel to Paris",
"coffe": ". Brandon loves coffee"
}
根据每个相关概念的t_index以及它们与自己的相关概念之间关系的强度对它们进行排序。以下是执行此操作的代码片段:
graph_concepts = graph_concept_nodes.values()
for concept in graph_concepts:
concept.sort_val = 0
for relation in concept.related_concepts:
concept.sort_val += (relation.t_index * 3) + relation.strength # TODO: 3 is a hyperparameter, move this to header
graph_concepts = sorted(graph_concepts, key=lambda c: c.sort_val)
graph_concepts.reverse() # We want them in descending order, so that highest temporal relations are first
在代码片段中,“score”被称为sort_val。sort_val越高,概念在时间上的相关性就越大。以下是发现与“pari”相关的每个概念的sort_val:
从问题中存在的概念的上下文开始称之为“基本概念”。在这种情况下,它只是“pari”。上下文只是一句“Brandon wants to travel to Paris”。然后,按照sort_val递减的顺序,为上一步中找到的概念的上下文做准备。这提供了传递给LLM的上下文:
. Brandon loves coffee # from related concept: coffee
. Brandon loves coffee. Brandon wants to travel to Paris # from related concept: brandon
. Brandon wants to travel to Paris # from essential concepts
步骤5:生成响应
剩下的就是用我们在前一步中构建的上下文提示LLM,并让它生成响应。提示如下所示:
Using the following statements when necessary, answer the question that follows. Each sentence in the following statements is true when read in chronological order:
statements:
. Brandon loves coffee
. Brandon loves coffee. Brandon wants to travel to Paris
. Brandon wants to travel to Paris
question:
Who wants to travel to Paris?
Answer:
有了这样一个全面而深思熟虑的提示,LLM在这种情况下使用gpt-3.5-turbo,能够生成正确的响应:“Brandon”。
为了评估 RecallM 的时间理解和记忆更新能力,研究人员设计了一个简易的实验。他们构建了一个数据集,包含了一系列按时间顺序排列的陈述,每个后续的陈述都反映了更新后的真实情况覆盖了之前的陈述。系统会定期接受提问,目的是考察其对于时间概念的理解,这些概念包括事件发生的先后和先前陈述中的知识。实验中包括了两类问题:传统的时间问题和长范围的时间问题。前者是为了测试系统对时间概念的理解、更新信念和记忆的能力;而后者则需要系统回忆数百条之前的陈述得到的信息。系统为了在25轮重复提问后正确回答长距离的时间问题,必须能够回想并推理出1500次以上之前更新的知识。这篇论文还分享了一些例子,展示了 RecallM 架构在维持和利用时间知识方面的有效性:
图5:时间推理中的RecallM与Vector DB
局限
然而,事实上的图关系:
图7:实际知识图谱
然而,即使是提出如下问题:
Who wants to travel to Paris?
Who likes cats?
Who loves coffee?
RecallM 提出了一种方法,仅利用图数据库就为大语言模型(LLM)集成了长期记忆功能。它在更新储存的知识和理解时间关系方面显示了其有效性,但如何创建精准的知识图谱等挑战仍旧存在。尽管如此,这是 AI 系统的一项重要进展,持续的研究为其完善和改善提供了机遇。
原文:https://pub.towardsai.net/how-to-do-rag-without-vector-databases-45fd4f6ced06
【1】https://arxiv.org/abs/2307.02738
53AI,企业落地应用大模型首选服务商
产品:大模型应用平台+智能体定制开发+落地咨询服务
承诺:先做场景POC验证,看到效果再签署服务协议。零风险落地应用大模型,已交付160+中大型企业
2024-03-30
2024-04-26
2024-05-10
2024-04-12
2024-05-28
2024-04-25
2024-05-14
2024-07-18
2024-04-26
2024-08-13
2024-12-22
2024-12-21
2024-12-21
2024-12-21
2024-12-21
2024-12-20
2024-12-20
2024-12-19