我要投稿

使用 LangChain 从文本数据构建知识图

发布日期：2024-04-10 20:48:16 浏览次数： 3372

作者：二师兄talks

微信搜一搜，关注“二师兄talks”

大脑以信息图的形式存储知识。

在本中，我将带领大家了解知识图谱以及如何从自己的文本数据构建一个。

什么是知识图谱？

Knowledge Graph，也称为语义图，是一种高效存储数据的智能结构。数据以节点和边的形式存储。如图1所示，节点表示对象，边表示对象之间的关系。以知识图为代表的数据模型有时被称为资源描述框架（RDF）。RDF 定义了万维网上站点互连的方式。

为什么需要知识图谱？

在整个数据故事中，只有少数数据点本质上代表整个数据集。因此，知识图仅存储重要的数据点。这显着降低了检索时间复杂度并降低了空间复杂度。

我最喜欢的知识图谱用例之一是药物发现和基于RAG的虚拟助手聊天机器人。

实施

1、安装和导入软件包

（注意：我们将使用Open AI的GPT-3.5来生成实体和关系，确保您已准备好您的Open AI Api密钥）

使用您喜欢的软件包管理器安装软件包。在这里，我使用PIP来安装和管理依赖项。

pip install -q langchain openai pyvis gradio==3.39.0

导入已安装的软件包。

from langchain.prompts import PromptTemplatefrom langchain.llms.openai import OpenAIfrom langchain.chains import LLMChainfrom langchain.graphs.networkx_graph import KG_TRIPLE_DELIMITERfrom pprint import pprintfrom pyvis.network import Networkimport networkx as nximport gradio as gr

2、设置API密钥

使用从Open AI平台仪表板复制的API密钥设置api密钥环境变量。在这里，我通过colab secrets传递变量，所以在运行单元格之前，请确保您已经为api密钥值分配了秘密变量。

from google.colab import userdataOPENAI_API_KEY = userdata.get('OPENAI_API_KEY')

3、定义提示

向 LLMs 提出正确的问题至关重要，以便他们能够生成我们需要的内容。在这里，我们添加了一些示例以及说明，以便在推断过程中减少幻觉。这种提示方式被称为Few-Shot提示。随时阅读提示以清楚地了解它的工作原理。

# 用于知识三元组提取的提示模板_DEFAULT_KNOWLEDGE_TRIPLE_EXTRACTION_TEMPLATE = ("You are a networked intelligence helping a human track knowledge triples"" about all relevant people, things, concepts, etc. and integrating"" them with your knowledge stored within your weights"" as well as that stored in a knowledge graph."" Extract all of the knowledge triples from the text."" A knowledge triple is a clause that contains a subject, a predicate,"" and an object. The subject is the entity being described,"" the predicate is the property of the subject that is being"" described, and the object is the value of the property.\\n\\n""EXAMPLE\\n""It's a state in the US. It's also the number 1 producer of gold in the US.\\n\\n"f"Output: (Nevada, is a, state){KG_TRIPLE_DELIMITER}(Nevada, is in, US)"f"{KG_TRIPLE_DELIMITER}(Nevada, is the number 1 producer of, gold)\\n""END OF EXAMPLE\\n\\n""EXAMPLE\\n""I'm going to the store.\\n\\n""Output: NONE\\n""END OF EXAMPLE\\n\\n""EXAMPLE\\n""Oh huh. I know Descartes likes to drive antique scooters and play the mandolin.\\n"f"Output: (Descartes, likes to drive, antique scooters){KG_TRIPLE_DELIMITER}(Descartes, plays, mandolin)\\n""END OF EXAMPLE\\n\\n""EXAMPLE\\n""{text}""Output:")
KNOWLEDGE_TRIPLE_EXTRACTION_PROMPT = PromptTemplate(input_variables=["text"],template=_DEFAULT_KNOWLEDGE_TRIPLE_EXTRACTION_TEMPLATE,)

4、初始化链

使用描述性提示，使用LLMChain类初始化链。

llm = OpenAI(api_key=OPENAI_API_KEY,temperature=0.9)
# 使用知识三元组提取提示创建一个LLMChainchain = LLMChain(llm=llm, prompt=KNOWLEDGE_TRIPLE_EXTRACTION_PROMPT)

要构建知识图谱，您只需要一些相关的文本数据。在这里，我从字符串输入加载文本。但是，重要的是要注意，您还可以使用Python中的数据加载器[3]从一些流行的数据格式（如PDF、JSON、markdown等）加载数据。

# Run the chain with the specified texttext = "The city of Paris is the capital and most populous city of France. The Eiffel Tower is a famous landmark in Paris."triples = chain.invoke({'text' : text}).get('text')

并使用此用户定义的函数解析检索到的三元组

def parse_triples(response, delimiter=KG_TRIPLE_DELIMITER):if not response:return []return response.split(delimiter)
triples_list = parse_triples(triples)
pprint(triples_list)

输出：

[' (Paris, is the capital of, France)', '(Paris, is the most populous city in, France)', '(Eiffel Tower, is a, famous landmark)', '(Eiffel Tower, is in, Paris)']

5、可视化构建的知识图谱

在这里，我们将使用PyVis创建出色的知识图谱可视化，并使用Gradio框架交互地显示它。

以下是一些用户定义的函数，以使我们的任务更容易：

def create_graph_from_triplets(triplets):G = nx.DiGraph()for triplet in triplets:subject, predicate, obj = triplet.strip().split(',')G.add_edge(subject.strip(), obj.strip(), label=predicate.strip())return G
def nx_to_pyvis(networkx_graph):pyvis_graph = Network(notebook=True, cdn_resources='remote')for node in networkx_graph.nodes():pyvis_graph.add_node(node)for edge in networkx_graph.edges(data=True):pyvis_graph.add_edge(edge[0], edge[1], label=edge[2]["label"])return pyvis_graph
def generateGraph():triplets = [t.strip() for t in triples_list if t.strip()]graph = create_graph_from_triplets(triplets)pyvis_network = nx_to_pyvis(graph)
pyvis_network.toggle_hide_edges_on_drag(True)pyvis_network.toggle_physics(False)pyvis_network.set_edge_smooth('discrete')
html = pyvis_network.generate_html()html = html.replace("'", "\\"")
return f"""<iframe style="width: 100%; height: 600px;margin:0 auto" name="result" allow="midi; geolocation; microphone; camera;display-capture; encrypted-media;" sandbox="allow-modals allow-formsallow-scripts allow-same-origin allow-popupsallow-top-navigation-by-user-activation allow-downloads" allowfullscreen=""allowpaymentrequest="" frameborder="0" srcdoc='{html}'></iframe>"""

使用 Gradio 显示 PyVis 生成的 html

demo = gr.Interface( generateGraph, inputs= None , outputs=gr.outputs.HTML(), title= “知识图谱” , allow_flagging= 'never' , live= True , ) 
demo.launch( height= 800 , width= “100%”）