我要投稿

llamaindex实战-ChatEngine-Context（上下文）模式

发布日期：2024-12-17 06:47:03 浏览次数： 2273 作者：大数据架构师修行之路

概述

ContextChatEngine 类是一个上下文聊天引擎，目的是：通过检索聊天的上下文信息、设置系统提示使用语言模型（LLM）生成响应，从而提供流畅的聊天体验。

它是一种简单的聊天模式，构建在数据检索器（retriever）之上。对于每个聊天交互：

首先使用用户消息从索引中检索文本
将检索到的文本设置为系统提示中的上下文
返回用户消息的答案

这种方法很简单，适用于与知识库和一般交互直接相关的问题。

实现逻辑

构建和使用本地大模型。这里使用的是gemma2这个模型，也可以配置其他的大模型。
从文档中构建索引
定义一个memory buffer用来保存历史的聊天内容
把索引转换成查询引擎：index.as_chat_engine，并设置chat_mode，和历史消息的内存buffer。

注意：由于检索到的上下文可能会占用大量可用的 LLM 上下文，因此我们要确保为聊天历史记录配置较小的限制：

memory = ChatMemoryBuffer.from_defaults(token_limit=1500)

实现代码

from llama_index.core import VectorStoreIndex, SimpleDirectoryReader, Settings
from llama_index.embeddings.huggingface import HuggingFaceEmbedding
from llama_index.llms.ollama import Ollama

local_model = "/opt/models/BAAI/bge-base-en-v1.5"
# bge-base embedding model
Settings.embed_model = HuggingFaceEmbedding(model_name=local_model)

# ollama
Settings.llm = Ollama(model="gemma2", request_timeout=360.0)

from llama_index.core import VectorStoreIndex, SimpleDirectoryReader

data = SimpleDirectoryReader(input_dir="./data/paul_graham/").load_data()
index = VectorStoreIndex.from_documents(data)

from llama_index.core.memory import ChatMemoryBuffer
memory = ChatMemoryBuffer.from_defaults(token_limit=1500)

# 构建聊天引擎
chat_engine = index.as_chat_engine(
    chat_mode="context",
    memory=memory,
    system_prompt=(
        "You are a chatbot, able to have normal interactions, as well as talk"
        " about an essay discussing Paul Grahams life."
  ),
)

# 测试效果
response = chat_engine.chat("Hello!")
print(response)

response = chat_engine.chat("What did Paul Graham do growing up?")
print(response)

response = chat_engine.chat("Can you tell me more?")
print(response)


print("--------------reset chat-------------------------")
chat_engine.reset()
response = chat_engine.chat("Hello! What do you know?")
print(response)