微信扫码
与创始人交个朋友
我要投稿
大脑以信息图的形式存储知识。
在本中,我将带领大家了解知识图谱以及如何从自己的文本数据构建一个。
Knowledge Graph,也称为语义图,是一种高效存储数据的智能结构。数据以节点和边的形式存储。如图1所示,节点表示对象,边表示对象之间的关系。以知识图为代表的数据模型有时被称为资源描述框架(RDF)。RDF 定义了万维网上站点互连的方式。
在整个数据故事中,只有少数数据点本质上代表整个数据集。因此,知识图仅存储重要的数据点。这显着降低了检索时间复杂度并降低了空间复杂度。
我最喜欢的知识图谱用例之一是药物发现和基于RAG的虚拟助手聊天机器人。
(注意:我们将使用Open AI的GPT-3.5来生成实体和关系,确保您已准备好您的Open AI Api密钥)
使用您喜欢的软件包管理器安装软件包。在这里,我使用PIP来安装和管理依赖项。
pip install -q langchain openai pyvis gradio==3.39.0
导入已安装的软件包。
from langchain.prompts import PromptTemplatefrom langchain.llms.openai import OpenAIfrom langchain.chains import LLMChainfrom langchain.graphs.networkx_graph import KG_TRIPLE_DELIMITERfrom pprint import pprintfrom pyvis.network import Networkimport networkx as nximport gradio as gr
使用从Open AI平台仪表板复制的API密钥设置api密钥环境变量。在这里,我通过colab secrets传递变量,所以在运行单元格之前,请确保您已经为api密钥值分配了秘密变量。
from google.colab import userdataOPENAI_API_KEY = userdata.get('OPENAI_API_KEY')
向 LLMs 提出正确的问题至关重要,以便他们能够生成我们需要的内容。在这里,我们添加了一些示例以及说明,以便在推断过程中减少幻觉。这种提示方式被称为Few-Shot提示。随时阅读提示以清楚地了解它的工作原理。
# 用于知识三元组提取的提示模板
_DEFAULT_KNOWLEDGE_TRIPLE_EXTRACTION_TEMPLATE = (
"You are a networked intelligence helping a human track knowledge triples"
" about all relevant people, things, concepts, etc. and integrating"
" them with your knowledge stored within your weights"
" as well as that stored in a knowledge graph."
" Extract all of the knowledge triples from the text."
" A knowledge triple is a clause that contains a subject, a predicate,"
" and an object. The subject is the entity being described,"
" the predicate is the property of the subject that is being"
" described, and the object is the value of the property.\\n\\n"
"EXAMPLE\\n"
"It's a state in the US. It's also the number 1 producer of gold in the US.\\n\\n"
f"Output: (Nevada, is a, state){KG_TRIPLE_DELIMITER}(Nevada, is in, US)"
f"{KG_TRIPLE_DELIMITER}(Nevada, is the number 1 producer of, gold)\\n"
"END OF EXAMPLE\\n\\n"
"EXAMPLE\\n"
"I'm going to the store.\\n\\n"
"Output: NONE\\n"
"END OF EXAMPLE\\n\\n"
"EXAMPLE\\n"
"Oh huh. I know Descartes likes to drive antique scooters and play the mandolin.\\n"
f"Output: (Descartes, likes to drive, antique scooters){KG_TRIPLE_DELIMITER}(Descartes, plays, mandolin)\\n"
"END OF EXAMPLE\\n\\n"
"EXAMPLE\\n"
"{text}"
"Output:"
)
KNOWLEDGE_TRIPLE_EXTRACTION_PROMPT = PromptTemplate(
input_variables=["text"],
template=_DEFAULT_KNOWLEDGE_TRIPLE_EXTRACTION_TEMPLATE,
)
使用描述性提示,使用LLMChain类初始化链。
llm = OpenAI(
api_key=OPENAI_API_KEY,
temperature=0.9
)
# 使用知识三元组提取提示创建一个LLMChain
chain = LLMChain(llm=llm, prompt=KNOWLEDGE_TRIPLE_EXTRACTION_PROMPT)
要构建知识图谱,您只需要一些相关的文本数据。在这里,我从字符串输入加载文本。但是,重要的是要注意,您还可以使用Python中的数据加载器[3]从一些流行的数据格式(如PDF、JSON、markdown等)加载数据。
# Run the chain with the specified texttext = "The city of Paris is the capital and most populous city of France. The Eiffel Tower is a famous landmark in Paris."triples = chain.invoke({'text' : text}).get('text')
并使用此用户定义的函数解析检索到的三元组
def parse_triples(response, delimiter=KG_TRIPLE_DELIMITER):
if not response:
return []
return response.split(delimiter)
triples_list = parse_triples(triples)
pprint(triples_list)
输出:
[' (Paris, is the capital of, France)', '(Paris, is the most populous city in, France)', '(Eiffel Tower, is a, famous landmark)', '(Eiffel Tower, is in, Paris)']
在这里,我们将使用PyVis创建出色的知识图谱可视化,并使用Gradio框架交互地显示它。
以下是一些用户定义的函数,以使我们的任务更容易:
def create_graph_from_triplets(triplets):
G = nx.DiGraph()
for triplet in triplets:
subject, predicate, obj = triplet.strip().split(',')
G.add_edge(subject.strip(), obj.strip(), label=predicate.strip())
return G
def nx_to_pyvis(networkx_graph):
pyvis_graph = Network(notebook=True, cdn_resources='remote')
for node in networkx_graph.nodes():
pyvis_graph.add_node(node)
for edge in networkx_graph.edges(data=True):
pyvis_graph.add_edge(edge[0], edge[1], label=edge[2]["label"])
return pyvis_graph
def generateGraph():
triplets = [t.strip() for t in triples_list if t.strip()]
graph = create_graph_from_triplets(triplets)
pyvis_network = nx_to_pyvis(graph)
pyvis_network.toggle_hide_edges_on_drag(True)
pyvis_network.toggle_physics(False)
pyvis_network.set_edge_smooth('discrete')
html = pyvis_network.generate_html()
html = html.replace("'", "\\"")
return f"""<iframe style="width: 100%; height: 600px;margin:0 auto" name="result" allow="midi; geolocation; microphone; camera;
display-capture; encrypted-media;" sandbox="allow-modals allow-forms
allow-scripts allow-same-origin allow-popups
allow-top-navigation-by-user-activation allow-downloads" allowfullscreen=""
allowpaymentrequest="" frameborder="0" srcdoc='{html}'></iframe>"""
使用 Gradio 显示 PyVis 生成的 html
demo = gr.Interface(
inputs= None ,
outputs=gr.outputs.HTML(),
title= “知识图谱” ,
allow_flagging= 'never' ,
live= True ,
height= 800 ,
width= “100%”
)
最终输出:我们使用 gradio 框架显示了我们的知识图,这样该页面也可以通过生成的链接轻松地与在线任何人共享。只需在方法share=True中添加demo.launch(share=True),您就可以使应用程序对任何人可见。
53AI,企业落地应用大模型首选服务商
产品:大模型应用平台+智能体定制开发+落地咨询服务
承诺:先做场景POC验证,看到效果再签署服务协议。零风险落地应用大模型,已交付160+中大型企业
2024-12-21
SAC-KG:利用大型语言模型一键构建领域知识图谱 - 中科大&阿里
2024-12-19
北大Chatlaw - 基于知识图谱增强混合专家模型的多智能体法律助手
2024-12-18
Elasticsearch vs 向量数据库:寻找最佳混合检索方案
2024-12-16
轻量高效的知识图谱RAG系统:LightRAG
2024-12-16
5种方法,让文本信息瞬间变成结构化图谱!
2024-12-16
向量数据库到底算不算一种NoSQL数据库?
2024-12-14
大模型能自动创建高质量知识图谱吗?可行性及人机协同机制 - WhyHow.AI
2024-12-12
大模型+知识图谱在工业领域落地的4大场景
2024-07-17
2024-07-11
2024-08-13
2024-07-13
2024-07-12
2024-06-24
2024-07-08
2024-06-10
2024-07-26
2024-07-04
2024-12-16
2024-12-10
2024-12-04
2024-12-01
2024-11-30
2024-11-22
2024-11-04
2024-10-10