微信扫码
与创始人交个朋友
我要投稿
Vanna[1] 是一个 MIT 许可的开源 Python RAG(Retrieval-Augmented Generation)框架,可以用来以对话形式与 SQL 数据库交互。
Vanna 提供两种使用方式:
vn.ask("What are the top 10 customers by sales?")
Vanna 的工作原理与通常的 RAG 原理类似,即:
Quickstart With Sample Data[3] 中提供的示例代码需要从 vanna.ai[4] 获得注册邮箱对应的 api_key
:
!pip install vanna
import vanna
from vanna.remote import VannaDefault
vn = VannaDefault(model='chinook', api_key=vanna.get_api_key('my-email@example.com'))
vn.connect_to_sqlite('https://vanna.ai/Chinook.sqlite')
vn.ask("What are the top 10 albums by sales?")
离线环境使用时,可以选择构建自定义类型的 Vanna 对象,避免对 vanna.ai
在线环境的依赖。
在 Quickstart With Your Own Data[5] 中,可以根据部署环境选择实际需要使用的 LLM、向量库 和 数据库类型。
以下以 OpenAI + ChromaDB + MySQL[6] 为例进行说明。
安装依赖(可通过内网源或构建镜像):
$ pip install 'vanna[chromadb,openai,mysql]'
准备向量嵌入模型文件,放至 ~/.cache/chroma/onnx_models/all-MiniLM-L6-v2/onnx.tar.gz
:
$ wget https://chroma-onnx-models.s3.amazonaws.com/all-MiniLM-L6-v2/onnx.tar.gz
也可从 ModelScope all-MiniLM-L6-v2[7] 下载。
构建 Vanna 实例,使用兼容 OpenAI 接口的本地 LLM:
from openai import OpenAI
client = OpenAI(api_key='sk-xxx', base_url='http://127.0.0.1:19131/v1/')
class MyVanna(ChromaDB_VectorStore, OpenAI_Chat):
def __init__(self, config=None):
ChromaDB_VectorStore.__init__(self, config=config)
OpenAI_Chat.__init__(self, client=client, config=config)
vn = MyVanna(config={'model': 'qwen1.5-72b-chat'})
配置数据库连接:
vn.connect_to_mysql(host='my-host', dbname='my-db', user='my-user', password='my-password', port=123)
准备“训练”数据:
# The information schema query may need some tweaking depending on your database. This is a good starting point.
df_information_schema = vn.run_sql("SELECT * FROM INFORMATION_SCHEMA.COLUMNS")
# This will break up the information schema into bite-sized chunks that can be referenced by the LLM
plan = vn.get_training_plan_generic(df_information_schema)
print(plan)
执行“训练”:
# If you like the plan, then uncomment this and run it to train
vn.train(plan=plan)
这里的“训练”,实际相当于是对数据进行向量化,并添加至向量库,并不涉及对 LLM 的权重调整。
可随时补充“训练”数据:
# The following are methods for adding training data. Make sure you modify the examples to match your database.
# DDL statements are powerful because they specify table names, colume names, types, and potentially relationships
vn.train(ddl='''
CREATE TABLE IF NOT EXISTS my-table (
id INT PRIMARY KEY,
name VARCHAR(100),
age INT
)
''')
# Sometimes you may want to add documentation about your business terminology or definitions.
vn.train(documentation="Our business defines OTIF score as the percentage of orders that are delivered on time and in full")
# You can also add SQL queries to your training data. This is useful if you have some queries already laying around. You can just copy and paste those from your editor to begin generating new SQL.
vn.train(sql="SELECT * FROM my-table WHERE name = 'John Doe'")
查看“训练数据”:
# At any time you can inspect what training data the package is able to reference
training_data = vn.get_training_data()
print(training_data)
或删除“训练数据”:
# You can remove training data if there's obsolete/incorrect information.
vn.remove_training_data(id='1-ddl')
对话时,vanna 会从“训练”数据中找出 10 个最相关的信息向量,将其作为输入给 LLM 的提示词的一部分,用以辅助生成 SQL:
vn.ask(question='有哪些表')
from vanna.flask import VannaFlaskApp
VannaFlaskApp(vn, allow_llm_to_see_data=True).run(port=8085, host='0.0.0.0')
上面代码会在 8085 端口启动一个 Vanna Flask Web App,更多参数设置可见 Customization[8]。
53AI,企业落地应用大模型首选服务商
产品:大模型应用平台+智能体定制开发+落地咨询服务
承诺:先做场景POC验证,看到效果再签署服务协议。零风险落地应用大模型,已交付160+中大型企业
2025-01-11
蚂蚁集团基于 Ray 构建的分布式 AI Agent 框架
2025-01-10
我们即将进入 Agentic AI 时代 ,而第一个落地就是 Coding Agent
2025-01-10
2025 AI Agent迷局:谁在玩真的,谁在演戏?
2025-01-10
AGI 通用人工智能模型:基础理论与实现路径
2025-01-09
杨芳贤|AI 2.0时代,如何拥抱与驾驭大模型?
2025-01-09
字节为AI埋下了三条主线
2025-01-09
深度长文|AI的“巴别塔”:多Agent协同为何如此之难?
2025-01-08
独家对话阿里云刘伟光:什么是真正的AI云
2024-08-13
2024-05-28
2024-04-26
2024-08-21
2024-06-13
2024-08-04
2024-07-09
2024-09-23
2024-07-18
2024-04-11