我要投稿

AI + 知识图谱 = GraphRAG：为企业构建更精准的聊天机器人

发布日期：2024-12-09 17:49:21 浏览次数： 2222

作者：活水智能

微信搜一搜，关注“活水智能”

自从企业开始采用检索增强生成（Retrieval Augmented Generation，RAG）以来，这项技术已经取得了显著进步。正如我们所见，各组织不断探索创新方法以挖掘更多价值。尽管检索过程和搜索算法变得更快、更高效，但在处理复杂任务（如多步逻辑推导）或回答需要将分散信息点连接起来的复杂问题时，它们仍然存在局限性。

让我们通过一个真实的例子来进一步探讨这个主题：“1492 年圣诞节沉没的哥伦布船叫什么名字？”

一个标准的 RAG 系统通常会遵循以下步骤：

识别事件：查找有关哥伦布船只及其沉船的信息。

确认日期：验证圣诞节当天发生事件的船只。

确定名称：提取特定船只的名称。

然而，第一个步骤往往会成为挑战，因为基础 RAG 系统主要依赖语义相似性进行文本检索。它们擅长找到相似的内容，但在连接多个事实以回答复杂问题时表现不足。当关键信息分散在不同文档中时，这些系统难以将其拼凑起来。传统解决方案（如为常见问题手动创建问答对）不仅成本高昂，而且不切实际。

为了解决这些局限性，微软研究院提出了一种创新性解决方案——GraphRAG。这种方法通过将知识图谱融入检索和生成过程，将 RAG 提升到了一个全新的高度。知识图谱通过将实体和关系以节点和边的形式保留下来，为数据创建了更丰富的表示。这就像将一团混乱的信息网转化为一张整齐有序的地图。看看下面的知识图谱，您会立刻明白，通过简单的图谱遍历，回答复杂问题变得多么轻松。是不是很神奇？

RAG 中的向量数据库与图数据库

在 RAG 系统中，选择向量数据库还是图数据库完全取决于您正在解决的问题、系统架构需求和性能目标。以下是一些帮助您决策的见解：

向量数据库：

• 擅长多维数据表示和相似性搜索。
• 适用于图像处理、推荐系统和实时 RAG。
• 随数据量水平扩展。
• 局限性：可能因近似最近邻（ANN）算法和维度问题而影响准确性。

图数据库：

• 专注于管理复杂关系和互联数据。
• 最适合社交网络分析、欺诈检测和知识表示。
• 在基于关系的查询和遍历方面表现出色。
• 局限性：在处理复杂结构时可能面临可扩展性挑战和延迟问题。

FalkorDB

FalkorDB 是一款为 GraphRAG 应用高度优化的低延迟数据库解决方案。其基于 Redis 的架构提供了高性能的图数据库，利用内存处理技术和高效的内存使用，与基于磁盘存储的图数据库相比，显著加快了查询执行速度并降低了延迟。因此，它能够高效存储和查询数据点之间的复杂关系。此外，它支持各种 AI 框架（如 LangChain 和 LlamaIndex），增强了其在构建 AI 应用方面的功能。

在本文中，我将向您展示如何为 BFSI 行业定制 GraphRAG 驱动的聊天机器人。通过一个假设的银行作为例子，我将演示该技术如何高效管理复杂的金融数据并解决客户查询。

前置条件

本教程已使用以下 Python 库进行了测试。请在操作时验证版本：

datasets==3.1.0
falkordb==1.0.9
gradio==5.6.0
langchain-community==0.3.7
langchain-core==0.3.17
langchain-experimental==0.3.3
langchain-google-genai==2.0.4
langchain-openai==0.2.8
langchain-text-splitters==0.3.2
langchain==0.3.7
openai==1.54.4
pypdf==5.1.0

确保为您的 API 密钥设置环境变量：

os.environ["OPENAI_API_KEY"] = "APIKEY"

构建知识图谱

设置 FalkorDB

您可以通过云端或本地 Docker 设置连接 FalkorDB。

若要在本地设置 FalkorDB，请确保系统已安装 Docker。运行以下命令启动 FalkorDB：

docker run -p 6379:6379 -p 3000:3000 -it --rm falkordb/falkordb:edge

或者，您可以通过 Docker Desktop 控制台启动容器。

要连接到云端，请创建一个账户并登录 FalkorDB 控制台。在仪表盘中，您可以创建一个 AWS 或 Google Cloud 实例并获取凭据。

数据导入

一旦 FalkorDB 启动，请定义并连接图数据库客户端。

import falkordb
from langchain_community.graphs importFalkorDBGraph
from langchain_community.graphs.graph_document importNode,Relationship


#For docker
graph =FalkorDBGraph(
    url="redis://localhost:6379", decode_responses=True
)


#For Cloud
graph =FalkorDBGraph(
    host="xxxx.cloud",
    username="your_falkordb_username",
    password="your_secret_password",
    port=52780,
    database="BFSI"
)

由于我们正在构建一个客户支持聊天机器人，我将使用一份银行手册，其中包含有关假设银行的全面信息。该数据集将演示聊天机器人如何处理有关银行产品和服务的复杂客户查询。当然，您也可以使用自己的数据集。

首先，从数据目录加载 PDF 文件。


from langchain_community.document_loaders import DirectoryLoader, PyPDFLoader

DOCS_PATH = "./data"

loader = DirectoryLoader(DOCS_PATH, glob="**/*.pdf", loader_cls=PyPDFLoader)

docs = loader.load()

本教程中，我将使用 OpenAI 的 LLM。以下是定义它的方法：

from langchain_openai import ChatOpenAI
llm = ChatOpenAI(temperature=0, model="gpt-4o-mini")

您可以手动创建知识图谱，也可以利用 LangChain 模块。

手动方法需要将文档拆分为块、识别节点和关系，并使用 Cypher 查询填充图谱。尽管有效，但它繁琐且耗时。以下是用于创建节点和关系的 Cypher 查询示例。

// Create Nodes for each label
CREATE (p:Program{id:'prog1'})
CREATE (fp:Financialproduct{id:'fin_prod1'})
CREATE (f:Feature{id:'feature1'})
CREATE (org:Organization{id:'org1'})
CREATE (s:Service{id:'service1'})


// Organization Relationships
CREATE (org)-[:MAINTAINS]->(f)
CREATE (org)-[:OFFERS]->(fp)
CREATE (org)-[:PROVIDES]->(f)
CREATE (org)-[:PROVIDES]->(s)
CREATE (org)-[:COMMITTED_TO]->(f)
CREATE (org)-[:DEVELOPS]->(p)
CREATE (org)-[:OFFERS]->(s)
CREATE (org)-[:OFFERS]->(p)


// Financial Product Relationships
CREATE (fp)-[:SECURE]->(org)
CREATE (fp)-[:INCLUDES]->(f)
CREATE (fp)-[:LINKED_TO]->(fp)
CREATE (fp)-[:MANAGED_THROUGH]->(f)
CREATE (fp)-[:HAS_FEATURE]->(f)
CREATE (fp)-[:OFFERS]->(p)


// Feature Relationships
CREATE (f)-[:OFFERED_BY]->(org)
CREATE (f)-[:PARTNERS_WITH]->(org)
CREATE (f)-[:INCLUDES]->(f)
CREATE (f)-[:ENCOURAGES]->(f)
CREATE (f)-[:COVERS]->(f)


// Program Relationships
CREATE (p)-[:INCLUDES]->(f)
CREATE (p)-[:OFFERS]->(fp)

在微软的 GraphRAG 版本中，提供给 LLM 的图谱提取提示如下所示：

-Goal-
Givenatextdocumentthatispotentiallyrelevanttothisactivityandalistofentitytypes,identifyallentitiesofthosetypesfromthetextandallrelationshipsamongtheidentifiedentities.


-Steps-
1.Identifyallentities.Foreachidentifiedentity,extract the following information:
-entity_name:Nameoftheentity,capitalized
-entity_type: One of the following types:[{entity_types}]
-entity_description:Comprehensivedescriptionoftheentity'sattributesandactivities
Formateachentityas("entity"{tuple_delimiter}<entity_name>{tuple_delimiter}<entity_type>{tuple_delimiter}<entity_description>


2.Fromtheentitiesidentifiedinstep1,identifyallpairsof(source_entity,target_entity)thatare*clearlyrelated*toeachother.
Foreachpairofrelatedentities,extract the following information:
-source_entity:nameofthesourceentity,asidentifiedinstep1
-target_entity:nameofthetargetentity,asidentifiedinstep1
-relationship_description:explanationastowhyyouthinkthesourceentityandthetargetentityarerelatedtoeachother
-relationship_strength:anumericscoreindicatingstrengthoftherelationshipbetweenthesourceentityandtargetentity
Formateachrelationshipas("relationship"{tuple_delimiter}<source_entity>{tuple_delimiter}<target_entity>{tuple_delimiter}<relationship_description>{tuple_delimiter}<relationship_strength>)


3.ReturnoutputinEnglishasasinglelistofalltheentitiesandrelationshipsidentifiedinsteps1and2.Use**{record_delimiter}**asthelistdelimiter.


4.Whenfinished,output {completion_delimiter}


<MultishotExamples>


-RealData-
######################
Entity_types: {entity_types}
Text: {input_text}
######################
Output:

为了简化，您可以将所需的 LLM 提供给 LangChain，让它完成剩下的工作。LLM 图谱转换器模块将负责为您创建知识图谱。让我解释一下背后的工作原理。

LLM 图谱转换器使用两种不同的方法来创建图谱：

1. 基于工具的模式：这是默认模式，适用于支持工具调用的任何 LLM。在此模式下，节点和关系被定义为类。
2. 基于提示的模式：这是备用模式，用于当 LLM 不支持工具调用时。在这种模式下，模型使用少样本学习从文本中提取实体及其关系。然后将这些数据解析为 JSON 格式以创建图节点和连接。

from langchain_experimental.graph_transformers import LLMGraphTransformer


graph_transformer = LLMGraphTransformer(llm=llm)
data = graph_transformer.convert_to_graph_documents(docs)
graph.add_graph_documents(data)

您可以指定自定义节点类型以限制图谱结构。如果不指定，LLM 将根据内容自动确定适当的节点类型。例如：

allowed_nodes = ["Organization", "FinancialProduct", "Feature", "Service", "Program"]
graph_transformer = LLMGraphTransformer(llm=llm, allowed_nodes=allowed_nodes)
data = graph_transformer.convert_to_graph_documents(docs)
graph.add_graph_documents(data)

创建图谱后，您可以检查其模式以验证结构。请注意，输出可能很长，因此不包括在此。

graph.refresh_schema()
print(graph.schema)

为了帮助您更好地理解我创建的知识图谱，以下是一个可视化表示：

查询知识图谱

尽管创建图数据库相对简单，但提取有意义的信息需要掌握像 Cypher 这样的查询语言。Cypher 是一种专为图数据库设计的声明式查询语言，使用模式匹配语法高效遍历节点和关系。FalkorDB 遵循 OpenCypher 格式。

以下是基于我们刚刚创建的知识图谱的 Cypher 查询示例：

results = graph.query("MATCH (sa:Financialproduct) RETURN sa")
content_list = []
for row in results:
    node = row[0]
    print(node)

查询返回银行提供的所有金融产品。

(:Financialproduct{id:"Savings Account"})
(:Financialproduct{id:"Savings_Account"})
(:Financialproduct{id:"Checking Account"})
(:Financialproduct{id:"Holiday-Themed Savings Incentives"})
(:Financialproduct{id:"Youth Savings Account"})

不同的图数据库以各种方式处理此类查询，可以通过直接实现或与 LangChain 等框架集成。例如，在 LangChain 中，可以通过以下方式执行查询：

from langchain.chains import FalkorDBQAChain


chain = FalkorDBQAChain.from_llm(llm=llm, graph=graph, cypher_prompt=cypher_generation_prompt, qa_prompt=chat_prompt, verbose=True,allow_dangerous_requests=True)
response = chain.run(input_user_prompt)

为了让您更清楚地了解背后的工作原理，我将实现一个自定义查询来演示底层机制。这将帮助您更直观地理解这些系统的后端运行逻辑。

自动化 Cypher 查询生成

我们需要结合一些提示工程以生成高质量的 Cypher 查询。目前，我们的实现使用精心设计的提示，将数据库模式（包括节点和关系）与用户查询结合起来。然而，总有改进的空间。您可以通过将工具调用与 OpenAI 集成或利用微调的语言模型来进一步优化查询生成。

以下是定义一个函数以优化模式提示的方法：

def format_schema_for_prompt(schema: Any)->str:
"""
    Format the graph schema into a clear, LLM-friendly string.
   
    Args:
        schema: Schema object from the graph database
       
    Returns:
        Formatted string representation of the schema
    """
try:
        nodes =set()
        relationships =[]

for item in schema:
ifhasattr(item,'start_node'):
                nodes.add(item.start_node)
                nodes.add(item.end_node)
                relationships.append({
'start': item.start_node,
'type': item.relationship_type,
'end': item.end_node
})

# Format the schema information
        formatted_output ="Node Types:\n"
for node insorted(nodes):
            formatted_output +=f"- {node}\n"

        formatted_output +="\nRelationships:\n"
for rel in relationships:
            formatted_output +=f"- {rel['start']} -[{rel['type']}]-> {rel['end']}\n"

return formatted_output

exceptExceptionas e:
# Fallback to returning raw schema if formatting fails
returnstr(schema)

格式化后的模式现在可以包含在提示模板中。

current_schema = graph.schema

formatted_schema = format_schema_for_prompt(current_schema)
 
system_prompt = f"""You are an expert at converting natural language questions into Cypher queries.
        The graph has the following schema:
       
        {formatted_schema}
       
        Return ONLY the Cypher query without any explanation or additional text.
        Make sure to use proper Cypher syntax and casing.
        Use the exact relationship types and node labels as shown in the schema."""

Cypher 查询输出分析

完成 Cypher 查询后，需要将结果传递给另一个 LLM。这种双 LLM 方法确保用户在聊天交互中收到清晰、上下文相关的信息，而不是原始数据库结果。

def format_results_for_llm(results: List)->str:
"""
    Format results in a way that's optimal for LLM analysis.


    Args:
        results: Processed query results


    Returns:
        Formatted string of results
    """
    output =""
for i, row inenumerate(results,1):
        output +=f"\nItem {i}:\n"
for item in row:
ifisinstance(item,dict):
                output +=f"Type: {item['type']}\n"
                output +="Properties:\n"
for key, value in item['properties'].items():
                    output +=f"  - {key}: {value}\n"
else:
                output +=f"Value: {item}\n"
        output +="---\n"
return output

分析提示可以结构化如下：

"""You are a financial services expert. Based on the graph query results provided,
        give a comprehensive analysis and explanation. Include relevant details about each item and how they relate
        to each other. If appropriate, suggest related products or services that might be relevant to the user.
        Format your response in a clear, structured way."""

现在，使用辅助函数并将其集成到主函数中。

def query_graph_with_llm(
    llm,
    graph,
    user_query: str,
    system_prompt=None,
    analysis_prompt: str = """You are a financial services expert. Based on the graph query results provided,
    give a comprehensive analysis and explanation. Include relevant details about each item and how they relate
    to each other. If appropriate, suggest related products or services that might be relevant to the user.
    Format your response in a clear, structured way."""
):
"""
    Query the knowledge graph using LLM-generated Cypher queries and analyze results.


    Args:
        llm: Language model instance
        graph: FalkorDB graph instance
        user_query: Natural language query from user
        analysis_prompt: Prompt for analyzing results


    Returns:
        Dict containing query results, metadata, and analysis
    """
try:
        current_schema = graph.schema
        formatted_schema = format_schema_for_prompt(current_schema)

        system_prompt =f"""You are an expert at converting natural language questions into Cypher queries.
        The graph has the following schema:
       
        {formatted_schema}
       
        Return ONLY the Cypher query without any explanation or additional text.
        Make sure to use proper Cypher syntax and casing.
        Use the exact relationship types and node labels as shown in the schema."""



        query_messages =[
{"role":"system","content": system_prompt},
{"role":"user","content":f"Convert this question to a Cypher query: {user_query}"}
]


        cypher_query = llm.predict_messages(query_messages).content


        cypher_query = re.sub(r'```cypher\s*|\s*```','', cypher_query).strip()



        results = graph.query(cypher_query)



        processed_results =[]
for row in results:
            row_data =[]
for item in row:
ifhasattr(item,'properties'):                      row_data.append({
'type': item.labels[0]ifhasattr(item,'labels')else item.type,
'properties':dict(item.properties)
})
else:
                    row_data.append(item)
            processed_results.append(row_data)



        results_text = format_results_for_llm(processed_results)


# Generate analysis using LLM
        analysis_messages =[
{"role":"system","content": analysis_prompt},
{"role":"user","content":f"User Question: {user_query}\n\nQuery Results:\n{results_text}\n\nPlease provide a comprehensive analysis of these results."}
]


        analysis = llm.predict_messages(analysis_messages).content


return{
'success':True,
'query': cypher_query,
'raw_results': processed_results,
'analysis': analysis,
'error':None,
'schema_used': formatted_schema          }


exceptExceptionas e:
return{
'success':False,
'query': cypher_query if'cypher_query'inlocals()elseNone,
'raw_results':None,
'analysis':None,
'error':str(e),
'schema_used': formatted_schema if'formatted_schema'inlocals()elseNone
        }

看起来不错！让我们测试一下这个函数。

query = "What financial products are available for young customers?"
results = query_graph_with_llm(llm, graph, query)
print(format_final_output(results))
CypherQuery:
MATCH (p:Product)<-[:AVAILABLE_FOR]-(c:Customer) WHERE c.age <30 RETURN p

Analysis:
Based on the query results regarding financial products available for young customers, we can analyze and categorize the offerings into several key areas.This analysis will help young customers understand their options and how these products can meet their financial needs.

### 1. **Savings Accounts**
-**YouthSavingsAccounts**:These accounts are specifically designed for young customers, often with lower minimum balance requirements and no monthly fees.They typically offer competitive interest rates to encourage saving from an early age.
-**Benefits**:Teaching financial responsibility, earning interest,and building a savings habit.

### 2. **Checking Accounts**
-**StudentCheckingAccounts**:Tailoredfor students, these accounts usually come with no monthly maintenance fees and free access to ATMs.They may also offer features like mobile banking and budgeting tools.
-**Benefits**:Easy access to funds, budgeting assistance,and financial management skills.

### 3. **Credit Cards**
-**SecuredCreditCards**:These are ideal for young customers looking to build credit.They require a cash deposit that serves as the credit limit, minimizing risk for the issuer.
-**StudentCreditCards**:Designedfor college students, these cards often have lower credit limits and rewards tailored to student spending (e.g., discounts on textbooks or dining).
-**Benefits**:Establishing a credit history, learning responsible credit use,and potential rewards.

### 4. **Investment Accounts**
-**CustodialAccounts**:For minors, these accounts allow parents or guardians to manage investments on behalf of the child until they reach adulthood.They can invest in stocks, bonds,or mutual funds.
-**Robo-Advisors**:Young customers can use robo-advisors to start investing with low fees and minimal initial investment.These platforms often provide automated portfolio management based on risk tolerance.
-**Benefits**:Early exposure to investing, potential for long-term growth,and financial literacy.

### 5. **Student Loans**
-**FederalStudentLoans**:These loans are available to students attending college and typically have lower interest rates and flexible repayment options.
-**PrivateStudentLoans**:Offered by banks and credit unions, these loans can help cover education costs not met by federal loans.
-**Benefits**:Access to higher education, potential for future earning increases,and various repayment options.

### 6. **Insurance Products**
-**HealthInsurance**:Young customers can often stay on their parents' health insurance plans until age 26, but they may also explore options through school or the marketplace.
   - **Renter's Insurance**:For young adults living independently, renter's insurance protects personal belongings and is often affordable.
   - **Benefits**: Financial protection against unexpected events and health-related expenses.

### 7. **Financial Education Resources**
   - **Workshops and Online Courses**: Many financial institutions offer free resources to educate young customers about budgeting, saving, and investing.
   - **Mobile Apps**: Budgeting apps can help young customers track their spending and savings goals.
   - **Benefits**: Empowering young customers with knowledge, improving financial literacy, and fostering responsible financial habits.

### **Conclusion and Recommendations**
Young customers have a variety of financial products tailored to their unique needs. It is essential for them to start with basic products like savings and checking accounts to build a solid financial foundation. As they progress, they can explore credit cards and investment accounts to enhance their financial literacy and creditworthiness.

**Related Products/Services Suggestions**:
- **Financial Planning Services**: Consider consulting with a financial advisor to create a personalized financial plan.
- **Budgeting Tools**: Utilize apps or software that help track expenses and savings goals.
- **Scholarship Search Services**: For students, finding scholarships can significantly reduce education costs.

By understanding these products and their interconnections, young customers can make informed decisions that will benefit their financial future.