我要投稿

使用 Ollama 和 Weaviate 构建用于隐私保护的本地 RAG 系统

发布日期：2024-05-25 10:18:45 浏览次数： 3056

作者：二师兄talks

微信搜一搜，关注“二师兄talks”

引言

构建一个基于大语言模型（LLM）的应用原型有趣且简单。但是，一旦你想把它用于公司的生产环境，你会立刻遇到各种挑战，例如怎样减少幻觉或如何保护数据隐私。虽然检索增强生成（Retrieval-Augmented Generation, RAG）技术已经被证明能有效减少幻觉，但本地部署则是保护隐私的一个最佳选择。

本篇文章将介绍如何在没有外部依赖的本地环境中，仅使用以下本地组件，用 Python 实现一个基于RAG 的聊天机器人：

使用 Ollama 的本地 LLM 和嵌入模型
通过 Docker 使用 Weaviate 的本地向量数据库实例

如何用 Ollama 设置本地语言模型

如果你知道用 Ollama 设置不到5分钟就能完成一个 AI 原型，可能会更早的布局 AI 应用。

步骤1：下载并安装Ollama

从官网下载操作系统对应的 Ollama 版本，并按照安装步骤操作。

步骤2：下载模型

打开终端，下载你选择的 LLMs 和嵌入模型。在这个教程中，我们会用 Meta 的 llama2 作为 LLM，用all-minilm 作为嵌入模型。

ollama pull llama2ollama pull all-minilm

其他可用的嵌入模型还包括 mxbai-embed-large（334M参数）和 nomic-embed-text（137M参数）。

步骤3：安装 Ollama Python 库

因为我们要用 Python 实现 RAG 管道，所以你需要安装 Python 库。这个教程中我们使用的是0.1.8版本。

pip install ollama

另外，Ollama 还提供了 REST API 和 JavaScript 库。

如何用 Docker 设置本地向量数据库实例

在本地 RAG 管道中，你会想把向量数据库托管在本地。下面我们会讲到如何用 Docker，在本地托管开源的 Weaviate 向量数据库。

步骤1：下载并安装Docker

安装 docker（Docker 17.09.0或更高版本）和 docker-compose（Docker Compose V2）CLI工具。

步骤2：启动包含 Weaviate 实例的 Docker 容器

现在，你可以在终端运行下面的命令，从默认的 Docker 镜像启动一个 Weaviate 实例。

docker run -p 8080:8080 -p 50051:50051 cr.weaviate.io/semitechnologies/weaviate:1.24.8

步骤3：安装 Weaviate Python 客户端

因为我们要用 Python 实现 RAG 管道，所以你需要安装 Python 库。这个教程中我们使用的是4.5.5版本。

pip install -U weaviate-client

如何构建本地RAG管道

完成上面的操作，你就可以开始实现RAG 管道了。

以下是基于Ollama博客上的一篇文章(https://ollama.com/blog/embedding-models)，做的示例。

准备：将数据导入向量数据库

构建 RAG 管道的第一步是将你的数据导入向量数据库。为此，你需要生成数据并嵌入。

下面是 Ollama 博客文章中使用的一些示例文档。

documents = ["Llamas are members of the camelid family meaning they're pretty closely related to vicuñas and camels","Llamas were first domesticated and used as pack animals 4,000 to 5,000 years ago in the Peruvian highlands","Llamas can grow as much as 6 feet tall though the average llama between 5 feet 6 inches and 5 feet 9 inches tall","Llamas weigh between 280 and 450 pounds and can carry 25 to 30 percent of their body weight","Llamas are vegetarians and have very efficient digestive systems","Llamas live to be about 20 years old, though some only live for 15 years and others live to be 30 years old",]

接下来，你需要连接本地运行的向量数据库实例。

import weaviateclient = weaviate.connect_to_local()

启动时，这个向量数据库是空的。要用你的数据填充它，你需要首先定义存储你数据的结构（在这个例子中叫做docs的集合）。因为示例数据只是一个简单的字符串列表，你可以只定义一个名为 text 的属性和 DataType.TEXT 的数据类型。

import weaviate.classes as wvcfrom weaviate.classes.config import Property, DataType
# Create a new data collectioncollection = client.collections.create(name = "docs", # Name of the data collectionproperties=[Property(name="text", data_type=DataType.TEXT), # Name and data type of the property],)

现在，你可以把数据加载到预定义的结构中。为此，你需要遍历你的文档并使用 Ollama 的embeddings() 方法将每个数据对象嵌入。然后，文本和它的嵌入一起被存储在向量数据库中。

import ollama
# Store each document in a vector embedding databasewith collection.batch.dynamic() as batch:for i, d in enumerate(documents):# Generate embeddingsresponse = ollama.embeddings(model = "all-minilm", prompt = d)
# Add data object with text and embeddingbatch.add_object(properties = {"text" : d},vector = response["embedding"],)

步骤1：检索上下文

在进行推理时，你会想要为你的问题检索额外的上下文。为此，你需要对你的问题进行一个简单的相似性搜索（比如，“What animals are llamas related to?”）。

在进行相似性搜索时，你首先需要像在数据导入阶段一样，使用 embeddings() 方法为你的搜索查询（这里是问题）生成向量嵌入。然后，你可以将得到的嵌入传递给 Weaviate的near_vector() 方法，并指定只检索最接近的结果（limit = 1）。

# An example promptprompt = "What animals are llamas related to?"
# Generate an embedding for the prompt and retrieve the most relevant docresponse = ollama.embeddings(model = "all-minilm",prompt = prompt,)
results = collection.query.near_vector(near_vector = response["embedding"], limit = 1)
data = results.objects[0].properties['text']

Llamas are members of the camelid family meaning they're pretty closely related to vicuñas and camels

步骤2：增强问题

接下来，你可以用原始问题和检索到的上下文来增强问题模板：

prompt_template = f"Using this data: {data}. Respond to this prompt: {prompt}"

步骤3：生成回答

最后，你可以使用 Ollama 的 generate() 方法，基于增强后的问题模板生成回答。

# Generate a response combining the prompt and data we retrieved in step 2output = ollama.generate(model = "llama2",prompt = prompt_template,)
print(output['response'])

Llamas are members of the camelid family, which means they are closely related to other animals in the same family, including:1. Vicuñas: Vicuñas are small, wild relatives of llamas and alpacas. They are found in the Andean region and are known for their soft, woolly coats.2. Camels: Camels are large, even-toed ungulates that are closely related to llamas and vicuñas. They are found in hot, dry climates around the world and are known for their ability to go without water for long periods of time.3. Guanacos: Guanacos are large, wild animals that are related to llamas and vicuñas. They are found in the Andean region and are known for their distinctive long necks and legs.4. Llama-like creatures: There are also other animals that are sometimes referred to as "llamas," such as the lama-like creatures found in China, which are actually a different species altogether. These creatures are not closely related to vicuñas or camels, but are sometimes referred to as "llamas" due to their physical similarities.In summary, llamas are related to vicuñas, camels, guanacos, and other animals that are sometimes referred to as "llamas."

总结

这篇文章通过一个非常简单的 RAG 管道示例，指导你了解如何使用本地组件（通过 Ollama 的语言模型，以及通过 Docker 自托管的 Weaviate 向量数据库）构建一个用于隐私保护的本地 RAG 系统。

资源：

Ollama 下载地址：https://ollama.com/
Docker 下载地址：https://www.docker.com/
Weaviate 向量数据库：https://weaviate.io/blog/what-is-a-vector-database
Ollma Blog 嵌入模型：https://ollama.com/blog/embedding-models
Github 代码地址：https://github.com/qianniucity/llm_notebooks/blob/main/rag/Ollama_Weaviate_Local_rag.ipynb