AI知识库

53AI知识库

学习大模型的前沿技术与行业应用场景


不到百行代码实现自然语言数据库查询:解密AI Agent的ReAct框架
发布日期:2024-11-25 19:25:59 浏览次数: 1831 来源:非架构


在人工智能的 Agent 设计中,ReAct(Reasoning + Acting)框架以其高效的决策与推理能力受到关注。本篇文章将通过 LangChain 的SQLAgent 实现的自然语言数据库查询的案例来演示 ReAct 框架的基本工作原理。

本文分为以下几个部分:

  1. Agent 的基本概念和 ReAct 框架简介

  2. 案例代码的基本结构和运行说明

  3. 查询结果分析:结合 ReAct 的问题解决方法

  4. 总结与展望

实际案例的运行画面

本文案例运行时使用的大模型是阿里云的qwen-max


一、Agent 的基本概念和 ReAct 框架简介

在人工智能中,Agent 是一种能够感知环境并采取行动以实现目标的系统。LangChain 提供了一种灵活的 Agent 开发框架,通过整合语言模型(LLMs)和工具(如搜索引擎、数据库等)来解决复杂任务。

ReAct 框架是 Agent 设计中的一种关键方法。它将推理(Reasoning)与行动(Acting)相结合,让 Agent 能够通过以下循环来解决问题:

  1. 推理:基于上下文生成合理的假设。

  2. 行动:根据推理采取具体的行动(如查询数据库、调用 API)。

  3. 反馈循环:根据行动结果重新推理和调整。

这一过程使 Agent 能够动态调整其行为,特别适合应对复杂、非结构化的问题。



二、LangChain SQLAgent 自然语言查询演示

案例代码

本部分通过一个案例展示如何使用 LangChain 的 SQLAgent,通过自然语言对数据库进行查询。以下是本文案例执行的 main.py的代码。完整项目代码可以从文末获得。

import os
from dotenv import load_dotenv
from langchain_community.agent_toolkits.sql.base import create_sql_agent
from langchain_community.utilities.sql_database import SQLDatabase
from langchain.agents.agent_types import AgentType
from langchain_openai import AzureChatOpenAI, ChatOpenAI
from langchain_community.callbacks.manager import get_openai_callback
from langchain_community.agent_toolkits import SQLDatabaseToolkit

# 加载环境变量
load_dotenv()

def get_llm():
"""根据环境变量选择使用的LLM"""
model_type = os.getenv("MODEL_TYPE", "azure")# 默认使用azure

if model_type.lower() == "azure":
return AzureChatOpenAI(
openai_api_key=os.getenv("AZURE_OPENAI_API_KEY"),
azure_endpoint=os.getenv("AZURE_OPENAI_ENDPOINT"),
deployment_name=os.getenv("AZURE_OPENAI_DEPLOYMENT_NAME"),
openai_api_version=os.getenv("AZURE_OPENAI_API_VERSION"),
temperature=0
)
else:
return ChatOpenAI(
model=os.getenv("MODEL_NAME", "gpt-3.5-turbo"),# 默认使用gpt-3.5-turbo
openai_api_key=os.getenv("OPENAI_API_KEY"),
openai_api_base=os.getenv("OPENAI_API_BASE"),# API基础URL
temperature=0
)

# 使用get_llm()替换原来的llm初始化
llm = get_llm()

# 连接数据库
db = SQLDatabase.from_uri("sqlite:///Chinook.db")

# 创建 SQL 工具包
toolkit = SQLDatabaseToolkit(db=db, llm=llm)

# 创建 SQL Agent
agent = create_sql_agent(
llm=llm,
toolkit=toolkit,
agent_type=AgentType.ZERO_SHOT_REACT_DESCRIPTION,
verbose=True
)

def main():
print("欢迎使用自然语言查询系统!")
print("输入 'quit' 或 'exit' 退出程序")

while True:
query = input("\n请输入您的查询: ")

if query.lower() in ['quit', 'exit']:
print("感谢使用!再见!")
break

try:
# 使用计数器追踪 token 使用情况
with get_openai_callback() as cb:
# 使用 invoke 替代 run
result = agent.invoke({"input": query})
print("\n查询结果:")
print(result["output"])
print(f"\nToken 使用情况:")
print(f"Prompt Tokens: {cb.prompt_tokens}")
print(f"Completion Tokens: {cb.completion_tokens}")
print(f"Total Tokens: {cb.total_tokens}")
print(f"Total Cost (USD): ${cb.total_cost}")
except Exception as e:
print(f"查询出错: {str(e)}")

if __name__ == "__main__":
main()

项目结构说明

项目整体结构设计如下:

.
├── README.md# 项目说明文档
├── .env # 环境变量文件
├── main.py# 主程序入口
├── Chinook.db # 示例数据库
└── Chinook_Sqlite.sql # 数据库初始化脚本
  • main.py: 核心代码,用于调用 LangChain SQLAgent。

  • .env: 用于存储敏感信息(如 API 密钥)。

  • Chinook.db: SQLite 数据库文件,包含示例数据。

  • Chinook_Sqlite.sql: 数据库初始化脚本。


快速开始

  1. 完成环境准备
    请确保已安装依赖、创建数据库并配置环境变量。

  2. 运行应用
    在终端中执行以下命令启动应用:

    python main.py
  3. 输入查询
    在应用运行后,输入自然语言查询,如:

  • “显示所有客户的名字和邮箱”

  • “查询销售额最高的5个员工”


数据库架构概览

Chinook 数据库是一个模拟的音乐销售数据库,包含以下主要表:

表名描述
Customer存储客户信息
Employee存储员工信息
Invoice存储订单信息
Track存储音轨信息
Album存储专辑信息
Artist存储艺术家信息

此架构支持丰富的查询场景,例如客户消费分析、热门艺术家排名等。


三、 查询案例分析

我们运行了上面的 main.py 程序,输入了下面的两个问题。结合系统回答这两个问题过程中的输出结果,我们分析了 ReAct 框架是如何助力大模型完成 Agent 的工作的。

3.1 第一个查询:统计雇员人数

输出结果

欢迎使用自然语言查询系统!
输入 'quit''exit' 退出程序

请输入您的查询: 统计雇员人数


> Entering new SQL Agent Executor chain...
Action: sql_db_list_tables
Action Input: Album, Artist, Customer, Employee, Genre, Invoice, InvoiceLine, MediaType, Playlist, PlaylistTrack, TrackThe "Employee" table is likely to contain information about employees. I should check the schema of the "Employee" table to confirm this.

Action: sql_db_schema
Action Input: Employee
CREATE TABLE "Employee" (
"EmployeeId" INTEGER NOT NULL,
"LastName" NVARCHAR(20) NOT NULL,
"FirstName" NVARCHAR(20) NOT NULL,
"Title" NVARCHAR(30),
"ReportsTo" INTEGER,
"BirthDate" DATETIME,
"HireDate" DATETIME,
"Address" NVARCHAR(70),
"City" NVARCHAR(40),
"State" NVARCHAR(40),
"Country" NVARCHAR(40),
"PostalCode" NVARCHAR(10),
"Phone" NVARCHAR(24),
"Fax" NVARCHAR(24),
"Email" NVARCHAR(60),
PRIMARY KEY ("EmployeeId"),
FOREIGN KEY("ReportsTo") REFERENCES "Employee" ("EmployeeId")
)

/*
3 rows from Employee table:
EmployeeIdLastNameFirstName Title ReportsTo BirthDate HireDateAddress CityState CountryPostalCodePhone Fax Email
1 Adams AndrewGeneral Manager None1962-02-18 00:00:00 2002-08-14 00:00:00 11120 Jasper Ave NW EdmontonAB CanadaT5K 2N1 +1 (780) 428-9482 +1 (780) 428-3457 andrew@chinookcorp.com
2 Edwards Nancy Sales Manager 1 1958-12-08 00:00:00 2002-05-01 00:00:00 825 8 Ave SWCalgary ABCanada T2P 2T3 +1 (403) 262-3443 +1 (403) 262-3322 nancy@chinookcorp.com
3 Peacock JaneSales Support Agent 2 1973-08-29 00:00:00 2002-04-01 00:00:00 1111 6 Ave SW Calgary ABCanada T2P 5M5 +1 (403) 262-3443 +1 (403) 262-6712 jane@chinookcorp.com
*/
The "Employee" table contains information about employees. To count the number of employees, I can use the `COUNT` function on the `EmployeeId` column.

Action: sql_db_query_checker
Action Input: SELECT COUNT(EmployeeId) AS EmployeeCount FROM Employee;```sql
SELECT COUNT(EmployeeId) AS EmployeeCount FROM Employee;
```The query is correct and ready to be executed.

Action: sql_db_query
Action Input: SELECT COUNT(EmployeeId) AS EmployeeCount FROM Employee;[(8,)]I now know the final answer.

Final Answer: The number of employees is 8.

> Finished chain.

查询结果:
The number of employees is 8.

Token 使用情况:
Prompt Tokens: 132
Completion Tokens: 15
Total Tokens: 147
Total Cost (USD): $0.0008849999999999999

用户问题: 统计公司员工的总人数。

在执行过程中,Agent先识别出需要的表名并获取其架构信息。通过以下步骤解决问题:

  1. 确认表结构: 检索Employee表的架构,确认其包含员工信息。

  2. 验证SQL查询语句: 构建了SELECT COUNT(EmployeeId) AS EmployeeCount FROM Employee;语句并通过语法检查。

  3. 执行查询并返回结果: 查询成功返回结果,表明员工总人数为 8

分析:

  • 过程优点: Agent能够自动推理查询需要的表,并生成正确的SQL语句,确保结果准确无误。

  • 典型性: 该查询属于简单聚合问题,解决逻辑清晰,适合展示ReAct框架如何处理常见问题。


3.2 第二个查询:统计各国艺术家的销量

输出结果

请输入您的查询: 统计各国艺术家的销量


> Entering new SQL Agent Executor chain...
Action: sql_db_list_tables
Action Input: Album, Artist, Customer, Employee, Genre, Invoice, InvoiceLine, MediaType, Playlist, PlaylistTrack, TrackTo calculate the sales of artists by country, I need to identify the relevant tables and their relationships. The `Artist` table will provide artist information, and the `Invoice` table will provide sales data. The `Customer` table will link sales to countries.

I will first check the schema of the `Artist`, `Invoice`, and `Customer` tables to understand their structure and relationships.

Action: sql_db_schema
Action Input: Artist, Invoice, Customer
CREATE TABLE "Artist" (
"ArtistId" INTEGER NOT NULL,
"Name" NVARCHAR(120),
PRIMARY KEY ("ArtistId")
)

/*
3 rows from Artist table:
ArtistIdName
1 AC/DC
2 Accept
3 Aerosmith
*/



CREATE TABLE "Customer" (
"CustomerId" INTEGER NOT NULL,
"FirstName" NVARCHAR(40) NOT NULL,
"LastName" NVARCHAR(20) NOT NULL,
"Company" NVARCHAR(80),
"Address" NVARCHAR(70),
"City" NVARCHAR(40),
"State" NVARCHAR(40),
"Country" NVARCHAR(40),
"PostalCode" NVARCHAR(10),
"Phone" NVARCHAR(24),
"Fax" NVARCHAR(24),
"Email" NVARCHAR(60) NOT NULL,
"SupportRepId" INTEGER,
PRIMARY KEY ("CustomerId"),
FOREIGN KEY("SupportRepId") REFERENCES "Employee" ("EmployeeId")
)

/*
3 rows from Customer table:
CustomerIdFirstName LastNameCompany Address CityState Country PostalCodePhoneFax Email SupportRepId
1 LuísGonçalves Embraer - Empresa Brasileira de Aeronáutica S.A.Av. Brigadeiro Faria Lima, 2170São José dos Campos SPBrazil12227-000 +55 (12) 3923-5555+55 (12) 3923-5566 luisg@embraer.com.br3
2 LeonieKöhlerNoneTheodor-Heuss-Straße 34 Stuttgart NoneGermany 70174 +49 0711 2842222 Noneleonekohler@surfeu.de 5
3 FrançoisTremblayNone1498 rue Bélanger MontréalQCCanadaH2G 1A7+1 (514) 721-4711 Noneftremblay@gmail.com 3
*/



CREATE TABLE "Invoice" (
"InvoiceId" INTEGER NOT NULL,
"CustomerId" INTEGER NOT NULL,
"InvoiceDate" DATETIME NOT NULL,
"BillingAddress" NVARCHAR(70),
"BillingCity" NVARCHAR(40),
"BillingState" NVARCHAR(40),
"BillingCountry" NVARCHAR(40),
"BillingPostalCode" NVARCHAR(10),
"Total" NUMERIC(10, 2) NOT NULL,
PRIMARY KEY ("InvoiceId"),
FOREIGN KEY("CustomerId") REFERENCES "Customer" ("CustomerId")
)

/*
3 rows from Invoice table:
InvoiceId CustomerIdInvoiceDate BillingAddressBillingCity BillingStateBillingCountry BillingPostalCode Total
1 2 2021-01-01 00:00:00 Theodor-Heuss-Straße 34 Stuttgart NoneGermany 70174 1.98
2 4 2021-01-02 00:00:00 Ullevålsveien 14OsloNoneNorway01713.96
3 8 2021-01-03 00:00:00 Grétrystraat 63 BrusselsNoneBelgium 10005.94
*/
To calculate the sales of artists by country, I need to join the `Artist`, `Customer`, and `Invoice` tables. However, I also need to consider the `Track` and `InvoiceLine` tables to link the artists to the invoices.

I will first check the schema of the `Track` and `InvoiceLine` tables to understand their structure and relationships.

Action: sql_db_schema
Action Input: Track, InvoiceLine
CREATE TABLE "InvoiceLine" (
"InvoiceLineId" INTEGER NOT NULL,
"InvoiceId" INTEGER NOT NULL,
"TrackId" INTEGER NOT NULL,
"UnitPrice" NUMERIC(10, 2) NOT NULL,
"Quantity" INTEGER NOT NULL,
PRIMARY KEY ("InvoiceLineId"),
FOREIGN KEY("TrackId") REFERENCES "Track" ("TrackId"),
FOREIGN KEY("InvoiceId") REFERENCES "Invoice" ("InvoiceId")
)

/*
3 rows from InvoiceLine table:
InvoiceLineId InvoiceId TrackId UnitPrice Quantity
1 1 2 0.991
2 1 4 0.991
3 2 6 0.991
*/



CREATE TABLE "Track" (
"TrackId" INTEGER NOT NULL,
"Name" NVARCHAR(200) NOT NULL,
"AlbumId" INTEGER,
"MediaTypeId" INTEGER NOT NULL,
"GenreId" INTEGER,
"Composer" NVARCHAR(220),
"Milliseconds" INTEGER NOT NULL,
"Bytes" INTEGER,
"UnitPrice" NUMERIC(10, 2) NOT NULL,
PRIMARY KEY ("TrackId"),
FOREIGN KEY("MediaTypeId") REFERENCES "MediaType" ("MediaTypeId"),
FOREIGN KEY("GenreId") REFERENCES "Genre" ("GenreId"),
FOREIGN KEY("AlbumId") REFERENCES "Album" ("AlbumId")
)

/*
3 rows from Track table:
TrackId NameAlbumId MediaTypeId GenreId ComposerMillisecondsBytes UnitPrice
1 For Those About To Rock (We Salute You) 1 1 1 Angus Young, Malcolm Young, Brian Johnson343719111703340.99
2 Balls to the Wall 2 2 1 U. Dirkschneider, W. Hoffmann, H. Frank, P. Baltes, S. Kaufmann, G. Hoffmann 3425625510424 0.99
3 Fast As a Shark 3 2 1 F. Baltes, S. Kaufman, U. Dirkscneider & W. Hoffman 230619 3990994 0.99
*/
To calculate the sales of artists by country, I need to join the `Artist`, `Customer`, `Invoice`, `InvoiceLine`, and `Track` tables. The `Track` table will link to the `Artist` table through the `Album` table, so I also need to check the schema of the `Album` table.

Action: sql_db_schema
Action Input: Album
CREATE TABLE "Album" (
"AlbumId" INTEGER NOT NULL,
"Title" NVARCHAR(160) NOT NULL,
"ArtistId" INTEGER NOT NULL,
PRIMARY KEY ("AlbumId"),
FOREIGN KEY("ArtistId") REFERENCES "Artist" ("ArtistId")
)

/*
3 rows from Album table:
AlbumId Title ArtistId
1 For Those About To Rock We Salute You 1
2 Balls to the Wall 2
3 Restless and Wild 2
*/
To calculate the sales of artists by country, I need to join the `Artist`, `Customer`, `Invoice`, `InvoiceLine`, `Track`, and `Album` tables. The `Track` table links to the `Album` table, which in turn links to the `Artist` table. The `InvoiceLine` table links to the `Invoice` table, which links to the `Customer` table to get the country information.

I will construct a query to join these tables and calculate the total sales for each artist by country.

Here is the query:

```sql
SELECT
c.Country,
ar.Name AS ArtistName,
SUM(il.UnitPrice * il.Quantity) AS TotalSales
FROM
InvoiceLine il
JOIN
Invoice i ON il.InvoiceId = i.InvoiceId
JOIN
Customer c ON i.CustomerId = c.CustomerId
JOIN
Track t ON il.TrackId = t.TrackId
JOIN
Album al ON t.AlbumId = al.AlbumId
JOIN
Artist ar ON al.ArtistId = ar.ArtistId
GROUP BY
c.Country, ar.Name
ORDER BY
TotalSales DESC

LIMIT 10;...



Action: sql_db_query
Action Input: SELECT c.Country, ar.Name AS ArtistName, SUM(il.UnitPrice * il.Quantity) AS TotalSales FROM InvoiceLine il JOIN Invoice i ON il.InvoiceId = i.InvoiceId JOIN Customer c ON i.CustomerId = c.CustomerId JOIN Track t ON il.TrackId = t.TrackId JOIN Album al ON t.AlbumId = al.AlbumId JOIN Artist ar ON al.ArtistId = ar.ArtistId GROUP BY c.Country, ar.Name ORDER BY TotalSales DESC LIMIT 10;[('USA', 'The Office', 33.82999999999999), ('USA', 'Iron Maiden', 33.65999999999998), ('USA', 'Metallica', 26.729999999999986), ('USA', 'U2', 22.769999999999992), ('Australia', 'Iron Maiden', 17.82), ('Czech Republic', 'Lost', 15.92), ('Ireland', 'Lost', 15.92), ('USA', 'Lost', 15.92), ('Canada', 'Os Paralamas Do Sucesso', 15.840000000000002), ('Portugal', 'Iron Maiden', 15.840000000000002)]I now know the final answer.

Final Answer: The top 10 sales of artists by country are as follows:
1. USA - The Office: $33.83
2. USA - Iron Maiden: $33.66
3. USA - Metallica: $26.73
4. USA - U2: $22.77
5. Australia - Iron Maiden: $17.82
6. Czech Republic - Lost: $15.92
7. Ireland - Lost: $15.92
8. USA - Lost: $15.92
9. Canada - Os Paralamas Do Sucesso: $15.84
10. Portugal - Iron Maiden: $15.84

> Finished chain.

查询结果:
The top 10 sales of artists by country are as follows:
1. USA - The Office: $33.83
2. USA - Iron Maiden: $33.66
3. USA - Metallica: $26.73
4. USA - U2: $22.77
5. Australia - Iron Maiden: $17.82
6. Czech Republic - Lost: $15.92
7. Ireland - Lost: $15.92
8. USA - Lost: $15.92
9. Canada - Os Paralamas Do Sucesso: $15.84
10. Portugal - Iron Maiden: $15.84

Token 使用情况:
Prompt Tokens: 221
Completion Tokens: 113
Total Tokens: 334
Total Cost (USD): $0.0028

输出结果分析示例


用户问题: 统计按国家划分的艺术家总销量。

Agent的处理过程:

  1. 识别表关系:

  • 初步锁定涉及的表,包括Artist(艺术家)、Customer(客户)、Invoice(发票)。

  • 进一步分析发现需要涉及Track(曲目)、InvoiceLine(发票明细)以及Album(专辑)表来建立连接关系。

  • 构建查询逻辑:

    • 通过分析表的外键关系,设计以下SQL查询:

      SELECT 
      c.Country,
      ar.Name AS ArtistName,
      SUM(il.UnitPrice * il.Quantity) AS TotalSales
      FROM
      InvoiceLine il
      JOIN
      Invoice i ON il.InvoiceId = i.InvoiceId
      JOIN
      Customer c ON i.CustomerId = c.CustomerId
      JOIN
      Track t ON il.TrackId = t.TrackId
      JOIN
      Album al ON t.AlbumId = al.AlbumId
      JOIN
      Artist ar ON al.ArtistId = ar.ArtistId
      GROUP BY
      c.Country, ar.Name
      ORDER BY
      TotalSales DESC
      LIMIT 10;
  • 执行并返回结果:

    1. 查询成功运行,返回了按销量排序的前10条结果。例如:

    • 美国(USA) - The Office: $33.83

    • 美国(USA) - Iron Maiden: $33.66

    分析:

    • 复杂性分析: 该问题需要通过多个表的多层级关联实现查询,属于复杂问题。Agent需要推理表关系,并正确地构造多表JOIN操作。

    • 过程优势:

      • 自动识别表关系并动态推理查询路径。

      • 对复杂查询提供详细的可解释步骤,有助于验证正确性。

    • 局限性:

      • 查询中返回的部分字段(如"Lost"艺术家的多次重复出现)提示在真实数据场景下可能需要更细化的逻辑处理。


    总结

    两个案例展示了ReAct框架在简单和复杂问题上的不同表现:

    • 对于简单问题,Agent能够快速给出答案,过程简洁高效。

    • 对于复杂问题,Agent通过动态推理和多表关联构建出精确的查询逻辑,体现出强大的推理能力。

    这种能力使得LangChain的SQL Agent在自动化数据库查询领域具有很大的应用潜力,特别是在处理复杂数据分析任务时具有显著优势。


    四、总结与展望

    本文通过 LangChain 的 SQLAgent 演示了 ReAct 框架的工作原理:推理与行动相结合的循环机制使其能够应对复杂的任务需求。这种机制不仅提升了 Agent 的通用性,还为构建更加智能化的应用提供了思路。

    未来,我们可以在以下方面进一步优化:

    1. 支持多语言:增强 Agent 处理多语言自然查询的能力。

    2. 扩展工具链:结合更多工具(如 Web 搜索、文档解析)扩展应用场景。

    3. 优化推理过程:通过上下文强化推理能力,提高对模糊需求的响应能力。

    LangChain SQLAgent 为开发者提供了一个直观的实验平台,让我们更深入理解 Agent 设计与 ReAct 框架的优势。期待这种技术在更多场景下的落地应用!


    五、源代码

    我已开源本文提到的源代码,以下是代码网址,你可以找到详细的配置和运行方法。

    • https://github.com/surfirst/sqlagent




    53AI,企业落地应用大模型首选服务商

    产品:大模型应用平台+智能体定制开发+落地咨询服务

    承诺:先做场景POC验证,看到效果再签署服务协议。零风险落地应用大模型,已交付160+中大型企业

    联系我们

    售前咨询
    186 6662 7370
    预约演示
    185 8882 0121

    微信扫码

    与创始人交个朋友

    回到顶部

     
    扫码咨询