微信扫码
与创始人交个朋友
我要投稿
利用LangSmith监测、跟踪、微调LLM和RAG
在日新月异的AI领域,大型语言模型(LLMs)已成为现代应用的中流砥柱,其在生成响应、增强客户互动和辅助内容创作等方面扮演着重要角色。要进一步提升LLM应用的效果,Langsmith为此提供了清晰的指导和强大的支持。
本文介绍Langsmith如何提升LLM应用,助力开发者在AI领域的发展。
Langsmith是个强大的监控和优化工具,专为LLM和RAG系统设计。它提供实时洞察,让开发者全面了解应用性能,从响应时间到准确率,帮助精准管理并提升LLM效率。本文将聚焦Langsmith在调试、监控和测试方面的应用。
以下是Langchain生态系统的高层次视图。本文聚焦Langsmith在调试、监控和测试方面的应用。
LangSmith生态系统
DevOps和MLOps分别革新了Web开发和数据科学领域,现在,这些理念也被引入AI应用管理。随着对高效AI系统的需求日益增长,集成、跟踪和监控越来越重要。Langsmith正是在这样的背景下,为AI应用提供了强大支持,帮助提升AI应用的效率和可靠性。接下来,我们看看它的实际应用,实践出真知!
LangSmith主要特性
这里构建一个基于RAG的问答系统,用以回答用户关于地缘政治的问题,内容涉及Wikipedia上的G7和G20国家信息。具体步骤如下:
# 使用pip安装langchain、langchain_openai和langchain_core库
from langchain_openai import ChatOpenAI
from langchain_core.prompts import ChatPromptTemplate
from langchain_core.output_parsers import StrOutputParser
import pandas as pd
要进行项目设置,你需要准备两个API密钥:一个用于集成Langsmith,另一个用于LLM模型。这里以OpenAI API为例,但你同样可以尝试其他模型,比如Gemini或Huggingface。
获取API密钥的步骤如下:
我们在使用Google Colab时,会将API密钥保存在secrets中。通过设置环境变量LANGCHAIN_TRACING_V2
为True,我们开启了LangSmith的跟踪功能,并定义了一个项目名称以便在LangSmith上进行跟踪。
from google.colab import userdata
import os
os.environ["OPENAI_API_KEY"] = userdata.get('OPENAI_API_KEY')
os.environ["LANGSMITH_API_KEY"] = userdata.get('LANGSMITH_API_KEY')
os.environ["LANGCHAIN_TRACING_V2"]="true"
os.environ["LANGCHAIN_PROJECT"] = "DEMO"
从文件内容中生成一个样本问答集,最终创建一个包含问答对的数据框。
context = """The G7 is a club of Western nations (with Japan given that status as an ally of the West and a major economy) that have dominated the world and its institutions, in some cases for centuries, and retain the ambition to maintain that position by policy coordination amongst themselves and by co-opting rising powers, including India, given the shifts in global power in recent decades.
The G7 recognised that they could not manage the 2008 financial crisis on their own and needed a wider international partnership, but one under their aegis. With this in mind, the G20 forum hitherto at the finance minister level was raised to the summit level. The G20 agenda is, however, shifting increasingly towards the interests and priorities of the developing countries (now being referred to as the Global South). During India’s G20 presidency, with India holding the Voice of the Global South summits before presiding over the G20 and at the conclusion of its work, and with the inclusion of the African Union as a G20 permanent member at India’s initiative, the pro-Global South content of the G20 agenda has got consolidated.
Both the G7 and the G20, however, face challenges from other platforms for consensus-building on global issues. BRICS, a group of non-Western countries, is getting expanded to resist the hegemony of the West that is still expressing itself in the form of sanctions, the weaponising of finance, regime change policies and double standards in addressing issues of democracy and human rights etc. An expanded BRICS will rival both the G7 and the G20 as a platform for promoting multipolarity, a greater role of developing countries in global governance, more equity in international relations, and introducing much-needed reforms in the international system."""
inputs = [
"What is the G7, and how has it historically positioned itself in global governance?",
"How did the 2008 financial crisis influence the role of the G20, and how has the agenda shifted?",
"How has India influenced the G20 agenda during its presidency?",
"What challenges do the G7 and G20 face from other global platforms?",
"How does the expansion of BRICS pose a threat to the G7 and G20 in terms of global influence?"
]
outputs = [
"The G7 is a group of Western nations, including Japan, which has historically dominated global institutions and policymaking. The G7's position stems from the economic and political power of its members, and they have coordinated policies to maintain their influence. In response to shifts in global power, the G7 has also sought to co-opt rising powers, such as India, in its strategic planning.",
"The G7 recognized that it could not handle the 2008 financial crisis alone and needed broader international cooperation. As a result, the G20, which had previously operated at the finance minister level, was elevated to the summit level to ensure greater global participation under G7 guidance. Over time, the G20's agenda has shifted more towards the interests of developing countries, especially under India’s leadership, where pro-Global South priorities have become more prominent, including the inclusion of the African Union as a permanent member.",
"During India’s G20 presidency, the country actively promoted the interests of developing countries by holding the 'Voice of the Global South' summits. India also pushed for the inclusion of the African Union as a permanent member of the G20, consolidating the agenda towards addressing the concerns of the Global South, such as greater equity and representation in global governance.",
"Both the G7 and G20 face challenges from other groups like BRICS, which consists of non-Western countries seeking to resist Western dominance. BRICS has expanded as a counterbalance to the G7's influence, particularly criticizing Western sanctions, financial controls, and regime change policies. An expanded BRICS aims to promote multipolarity, increase the role of developing countries in global governance, and push for reforms in the international system.",
"The expansion of BRICS is a direct challenge to the G7 and G20 as it aims to offer a platform that promotes multipolarity and reduces Western hegemony. By advocating for greater equity in international relations and pushing for reforms in global governance structures, an expanded BRICS seeks to rival the G7 and G20, providing an alternative consensus-building mechanism for developing nations and non-Western powers."
]
# 数据集
qa_pairs = [{"question": q, "answer": a} for q, a in zip(inputs, outputs)]
df = pd.DataFrame(qa_pairs)
df
根据上一节的数据,在LangSmith上创建一个新的数据集,并为其命名及添加描述。
from langsmith import Client
client = Client()
dataset_name = "Geo-politics"
# 存储
dataset = client.create_dataset(
dataset_name=dataset_name,
description="QA pairs about Geo-politics model.",
)
client.create_examples(
inputs=[{"question": q} for q in inputs],
outputs=[{"answer": a} for a in outputs],
dataset_id=dataset.id,
)
执行上述代码后,系统会提供一个链接,用于访问LangSmith上的数据集和进行测试。或者,可以直接访问LangSmith官网(https://smith.langchain.com),登录后点击“数据集和测试”选项卡继续操作。
数据集和 LangSmith 测试
有了API密钥、数据集和其他配置,现在可以创建函数,该函数处理输入问题并使用LLM模型生成响应——特别是,本例中使用的是OpenAI。这个函数会返回一个包含回答的字典。
import openai
from langsmith.wrappers import wrap_openai
openai_client = wrap_openai(openai.Client())
def get_response_from_llm(inputs: dict) -> dict:
"""
Generates answers to user questions based on a provided website
text using OpenAI API.
Parameters:
inputs (dict): A dictionary with a single key 'question',
representing the user's question as a string.
Returns:
dict: A dictionary with a single key 'output', containing the
generated summary as a string.
"""
# 系统提示
system_msg = (
f"Answer user questions in 2-3 sentences about this
context: \n\n\n {context}"
)
# 传入网页文本
messages = [
{"role": "system", "content": system_msg},
{"role": "user", "content": inputs["question"]},
]
# 调用OpenAI
response = openai_client.chat.completions.create(
messages=messages, model="gpt-3.5-turbo"
)
# 输出字典中的响应
return {"answer": response.dict()["choices"][0]["message"]["content"]}
至于如何验证LLM输出的准确性,确保回答没有偏离事实,我们在后续步骤中使用LangSmith的内置评估工具来进行检验。
要评估LLM模型的表现,我们需要将LLM输出与真实情况进行比较。有多种方法可以做到这一点,比如可以用余弦相似度来衡量两者的匹配程度,分数越高,说明越接近。不过,这回我们用LangSmith内置的评估器cot_qa,它是专门为问答系统设计的,正好派上用场。
from langsmith.evaluation import evaluate, LangChainStringEvaluator
# 评估器
qa_evalulator = [LangChainStringEvaluator("cot_qa")]
dataset_name = "Geo-politics"
experiment_results = evaluate(
get_response_from_llm,
data=dataset_name,
evaluators=qa_evalulator,
experiment_prefix="LLM Ouput",
# Any experiment metadata can be specified here
metadata={
"variant": "stuff website context into gpt-3.5-turbo",
},
)
执行代码后,LangSmith会跟踪输出结果。
在 LangSmith 上进行跟踪、监测和评估
结果概览:
我们上传了四组问答对到LangSmith。LLM生成的答案记录在第三列,每项旁边显示“成功”表示与标准答案匹配。
使用LangChainStringEvaluator进行准确评估。
深入了解Langsmith的具体产出
深入分析:
点击第一项输出,可以查看详细结果。左侧显示模型名称GPT-3.5-turbo,右侧则展示了时间戳、延迟等额外指标。
随着AI技术的迅猛发展,工具如Langsmith显得越来越重要,这让开发者和数据科学家协同提升AI系统的效率和稳定性,确保应用能够满足市场新需求,提供优质用户体验。依靠Langsmith,开发者可以自信地迎接AI开发的挑战,释放AI应用的潜能,共同见证系统的蓬勃发展。
本书为大模型应用开发极简入门手册,为初学者提供了一份清晰、全面的“可用知识”,带领大家快速了解GPT-4和ChatGPT的工作原理及优势,并在此基础上使用流行的Python编程语言构建大模型应用。通过本书,你不仅可以学会如何构建文本生成、问答和内容摘要等初阶大模型应用,还能了解到提示工程、模型微调、插件、LangChain等高阶实践技术。书中提供了简单易学的示例,帮你理解并应用在自己的项目中。此外,书后还提供了一份术语表,方便你随时参考。
准备好了吗?只需了解Python,你即可将本书作为进入大模型时代的启动手册,开发出自己的大模型应用。
购买链接:https://item.jd.com/14377544.html
手把手教你用Ollama和Llama3打造Spring Boot AI应用
53AI,企业落地应用大模型首选服务商
产品:大模型应用平台+智能体定制开发+落地咨询服务
承诺:先做场景POC验证,看到效果再签署服务协议。零风险落地应用大模型,已交付160+中大型企业
2024-07-11
2024-07-11
2024-07-09
2024-09-18
2024-06-11
2024-07-23
2024-07-20
2024-07-12
2024-07-26
2024-07-23
2024-11-18
2024-11-16
2024-11-16
2024-10-31
2024-10-31
2024-10-27
2024-10-26
2024-10-25