我要投稿

大模型RAG神器，利用LangSmith监测、跟踪、微调LLM和RAG

发布日期：2024-11-19 18:06:32 浏览次数： 2670 作者：AI科技论谈

利用LangSmith监测、跟踪、微调LLM和RAG

在日新月异的AI领域，大型语言模型（LLMs）已成为现代应用的中流砥柱，其在生成响应、增强客户互动和辅助内容创作等方面扮演着重要角色。要进一步提升LLM应用的效果，Langsmith为此提供了清晰的指导和强大的支持。

本文介绍Langsmith如何提升LLM应用，助力开发者在AI领域的发展。

1 Langsmith简介

Langsmith是个强大的监控和优化工具，专为LLM和RAG系统设计。它提供实时洞察，让开发者全面了解应用性能，从响应时间到准确率，帮助精准管理并提升LLM效率。本文将聚焦Langsmith在调试、监控和测试方面的应用。

以下是Langchain生态系统的高层次视图。本文聚焦Langsmith在调试、监控和测试方面的应用。

LangSmith生态系统

2 选择LangSmith的理由

DevOps和MLOps分别革新了Web开发和数据科学领域，现在，这些理念也被引入AI应用管理。随着对高效AI系统的需求日益增长，集成、跟踪和监控越来越重要。Langsmith正是在这样的背景下，为AI应用提供了强大支持，帮助提升AI应用的效率和可靠性。接下来，我们看看它的实际应用，实践出真知！

LangSmith主要特性

3 将Langsmith融入LLM工作流

这里构建一个基于RAG的问答系统，用以回答用户关于地缘政治的问题，内容涉及Wikipedia上的G7和G20国家信息。具体步骤如下：

利用源内容生成问答对；
在Langsmith中创建数据库并加载数据；
开发函数处理数据，从LLM获取输出；
利用Langsmith内置评估器测试LLM答案的准确性；
跟踪所有步骤并分析相关指标。与 LangSmith 集成的 Q&A RAG 系统的表现形式

步骤1：加载库

# 使用pip安装langchain、langchain_openai和langchain_core库
from langchain_openai import ChatOpenAI
from langchain_core.prompts import ChatPromptTemplate
from langchain_core.output_parsers import StrOutputParser
import pandas as pd

步骤2：项目设置和访问API

要进行项目设置，你需要准备两个API密钥：一个用于集成Langsmith，另一个用于LLM模型。这里以OpenAI API为例，但你同样可以尝试其他模型，比如Gemini或Huggingface。

获取API密钥的步骤如下：

登录Langsmith/ OpenAI，并进入API设置页面。
在“API密钥”部分，选择创建新的API密钥，并设置相应权限。
生成密钥后，复制并妥善保管，以便后续在应用或集成中使用。

步骤3：设置API密钥

我们在使用Google Colab时，会将API密钥保存在secrets中。通过设置环境变量LANGCHAIN_TRACING_V2为True，我们开启了LangSmith的跟踪功能，并定义了一个项目名称以便在LangSmith上进行跟踪。

from google.colab import userdata
import os

os.environ["OPENAI_API_KEY"] = userdata.get('OPENAI_API_KEY')
os.environ["LANGSMITH_API_KEY"] = userdata.get('LANGSMITH_API_KEY')
os.environ["LANGCHAIN_TRACING_V2"]="true"
os.environ["LANGCHAIN_PROJECT"] = "DEMO"

步骤4：数据处理

从文件内容中生成一个样本问答集，最终创建一个包含问答对的数据框。

context = """The G7 is a club of Western nations (with Japan given that status as an ally of the West and a major economy) that have dominated the world and its institutions, in some cases for centuries, and retain the ambition to maintain that position by policy coordination amongst themselves and by co-opting rising powers, including India, given the shifts in global power in recent decades.
The G7 recognised that they could not manage the 2008 financial crisis on their own and needed a wider international partnership, but one under their aegis. With this in mind, the G20 forum hitherto at the finance minister level was raised to the summit level. The G20 agenda is, however, shifting increasingly towards the interests and priorities of the developing countries (now being referred to as the Global South). During India’s G20 presidency, with India holding the Voice of the Global South summits before presiding over the G20 and at the conclusion of its work, and with the inclusion of the African Union as a G20 permanent member at India’s initiative, the pro-Global South content of the G20 agenda has got consolidated.
Both the G7 and the G20, however, face challenges from other platforms for consensus-building on global issues. BRICS, a group of non-Western countries, is getting expanded to resist the hegemony of the West that is still expressing itself in the form of sanctions, the weaponising of finance, regime change policies and double standards in addressing issues of democracy and human rights etc. An expanded BRICS will rival both the G7 and the G20 as a platform for promoting multipolarity, a greater role of developing countries in global governance, more equity in international relations, and introducing much-needed reforms in the international system."""


inputs = [
    
"What is the G7, and how has it historically positioned itself in global governance?",
    
"How did the 2008 financial crisis influence the role of the G20, and how has the agenda shifted?",
    
"How has India influenced the G20 agenda during its presidency?",
    
"What challenges do the G7 and G20 face from other global platforms?",
    
"How does the expansion of BRICS pose a threat to the G7 and G20 in terms of global influence?"
]

outputs = [
    
"The G7 is a group of Western nations, including Japan, which has historically dominated global institutions and policymaking. The G7's position stems from the economic and political power of its members, and they have coordinated policies to maintain their influence. In response to shifts in global power, the G7 has also sought to co-opt rising powers, such as India, in its strategic planning.",
    
"The G7 recognized that it could not handle the 2008 financial crisis alone and needed broader international cooperation. As a result, the G20, which had previously operated at the finance minister level, was elevated to the summit level to ensure greater global participation under G7 guidance. Over time, the G20's agenda has shifted more towards the interests of developing countries, especially under India’s leadership, where pro-Global South priorities have become more prominent, including the inclusion of the African Union as a permanent member.",
    
"During India’s G20 presidency, the country actively promoted the interests of developing countries by holding the 'Voice of the Global South' summits. India also pushed for the inclusion of the African Union as a permanent member of the G20, consolidating the agenda towards addressing the concerns of the Global South, such as greater equity and representation in global governance.",
    
"Both the G7 and G20 face challenges from other groups like BRICS, which consists of non-Western countries seeking to resist Western dominance. BRICS has expanded as a counterbalance to the G7's influence, particularly criticizing Western sanctions, financial controls, and regime change policies. An expanded BRICS aims to promote multipolarity, increase the role of developing countries in global governance, and push for reforms in the international system.",
    
"The expansion of BRICS is a direct challenge to the G7 and G20 as it aims to offer a platform that promotes multipolarity and reduces Western hegemony. By advocating for greater equity in international relations and pushing for reforms in global governance structures, an expanded BRICS seeks to rival the G7 and G20, providing an alternative consensus-building mechanism for developing nations and non-Western powers."
]

# 数据集
qa_pairs = [{"question": q, "answer": a} for q, a in zip(inputs, outputs)]
df = pd.DataFrame(qa_pairs)
df

数据处理

步骤5：在LangSmith上创建数据集

根据上一节的数据，在LangSmith上创建一个新的数据集，并为其命名及添加描述。

from langsmith import Client

client = Client()
dataset_name = "Geo-politics"

# 存储
dataset = client.create_dataset(
    dataset_name=dataset_name,
    description="QA pairs about Geo-politics model.",
)
client.create_examples(
    inputs=[{"question": q} for q in inputs],
    outputs=[{"answer": a} for a in outputs],
    dataset_id=dataset.id,
)

执行上述代码后，系统会提供一个链接，用于访问LangSmith上的数据集和进行测试。或者，可以直接访问LangSmith官网(https://smith.langchain.com)，登录后点击“数据集和测试”选项卡继续操作。

数据集和 LangSmith 测试

步骤6：使用LLM模型生成输出

有了API密钥、数据集和其他配置，现在可以创建函数，该函数处理输入问题并使用LLM模型生成响应——特别是，本例中使用的是OpenAI。这个函数会返回一个包含回答的字典。

import openai
from langsmith.wrappers import wrap_openai

openai_client = wrap_openai(openai.Client())


def get_response_from_llm(inputs: dict) -> dict:
    
"""
    Generates answers to user questions based on a provided website 
    text using OpenAI API.

    Parameters:
    inputs (dict): A dictionary with a single key 'question', 
    representing the user's question as a string.

    Returns:
    dict: A dictionary with a single key 'output', containing the 
    generated summary as a string.
    """

    
# 系统提示
    system_msg = (
         f"Answer user questions in 2-3 sentences about this 
            context: \n\n\n {context}"
    )

    
# 传入网页文本
    messages = [
        {"role": "system", "content": system_msg},
        {"role": "user", "content": inputs["question"]},
    ]

    
# 调用OpenAI
    response = openai_client.chat.completions.create(
        messages=messages, model="gpt-3.5-turbo"
    )

    
# 输出字典中的响应
    
return {"answer": response.dict()["choices"][0]["message"]["content"]}

至于如何验证LLM输出的准确性，确保回答没有偏离事实，我们在后续步骤中使用LangSmith的内置评估工具来进行检验。

步骤7：使用LLM模型评估RAG

要评估LLM模型的表现，我们需要将LLM输出与真实情况进行比较。有多种方法可以做到这一点，比如可以用余弦相似度来衡量两者的匹配程度，分数越高，说明越接近。不过，这回我们用LangSmith内置的评估器cot_qa，它是专门为问答系统设计的，正好派上用场。

from langsmith.evaluation import evaluate, LangChainStringEvaluator

# 评估器
qa_evalulator = [LangChainStringEvaluator("cot_qa")]
dataset_name = "Geo-politics"

experiment_results = evaluate(
    get_response_from_llm,
    data=dataset_name,
    evaluators=qa_evalulator,
    experiment_prefix="LLM Ouput",
    
# Any experiment metadata can be specified here
    metadata={
        
"variant": "stuff website context into gpt-3.5-turbo",
    },
)