AI知识库

53AI知识库

学习大模型的前沿技术与行业应用场景


QwQ-32B,支持Function Call的推理模型,深度思考Agent的时代来了!

发布日期:2025-03-10 20:51:09 浏览次数: 1539 来源:魔搭ModelScope社区
推荐语

深度思考Agent的时代来临,QwQ-32B模型开启推理新纪元!

核心内容:
1. QwQ-32B模型与DeepSeek-R1性能对比
2. 构建支持Function Call的聊天API接口
3. 生成函数参数并调用外部函数

杨芳贤
53A创始人/腾讯云(TVP)最具价值专家
00

前言



近期,Qwen 发布了 QwQ-32B - 一个在许多基准测试中性能可与 DeepSeek-R1 相媲美的推理模型。QwQ在推理模型中集成了调用工具的能力,使其能够在使用工具的同时进行批判性思考,并根据反馈调整推理过程。这样的能力使得QwQ能够很好在Agentic System中使用。本文介绍如何通过vLLM和SgLang结合QwQ-32B,搭建OpenAI格式的聊天API,并与外部函数结合来拓展模型的更多功能。


tools是OpenAI的Chat Completion API中的一个可选参数,可用于提供函数调用规范(function specifications)。这样做的目的是使模型能够生成符合所提供的规范的函数参数格式。同时,API 实际上不会执行任何函数调用。开发人员需要使用模型输出来执行函数调用。


vLLM和SgLang均支持OpenAI-API的tool参数。通过tool参数以及其中的函数调用规范,QwQ将能决定何时调用什么样的函数,以及怎么调用函数。


注:本文测试用例参考OpenAI cookbook:https://cookbook.openai.com/examples/how_to_call_functions_with_chat_models


本文主要包含以下两个个部分:

  • 模型部署:使用vLLM,SgLang和QwQ,通过设置参数,部署支持Function call的聊天API接口。

  • 生成函数参数:指定一组函数并使用 API 生成函数参数。


01

模型部署



模型文件下载

modelscope download --model=Qwen/QwQ-32B --local_dir ./QwQ-32B


环境安装

pip install vllmpip install "sglang[all]>=0.4.3.post2"


vLLM部署命令

vllm serve /ModelPath/QwQ-32B \--port 8000 \--reasoning-parser deepseek_r1 \--max_model_len 4096 \--enable-auto-tool-choice \--tool-call-parser hermes


sglang部署命令

python -m sglang.launch_server --model-path /ModelPath/QwQ-32B --port 3001 --host 0.0.0.0 --tool-call-parser qwen25

模型调用

使用OpenAI的API格式调用本地部署的QwQ模型

单轮对话

from openai import OpenAI # 设置 OpenAI 的 API 密钥和 API 基础 URL 使用 vLLM 的 API 服务器。openai_api_key = "EMPTY"openai_api_base = "http://localhost:8000/v1" client = OpenAI(    api_key=openai_api_key,    base_url=openai_api_base,) # 使用流式输出(stream=True)chat_response = client.chat.completions.create(    model="path/to/QwQ-32B",    messages=[{"role": "user", "content": "你好"}],    stream=True  # 启用流式响应) # 处理流式输出contents = []for e in chat_response:    # print(e.choices[0].delta.content,end="")    contents.append(e.choices[0].delta.content)print("".join(contents))


多轮对话

from openai import OpenAIimport os
# 初始化OpenAI客户端client = OpenAI( api_key = "empty", base_url="http://localhost:8000/v1")
reasoning_content = "" # 定义完整思考过程answer_content = "" # 定义完整回复is_answering = False # 判断是否结束思考过程并开始回复
messages = []conversation_idx = 1while True: print("="*20+f"第{conversation_idx}轮对话"+"="*20) conversation_idx += 1 user_msg = {"role": "user", "content": input("请输入你的消息:")} messages.append(user_msg) # 创建聊天完成请求 completion = client.chat.completions.create( model="path/to/QwQ-32B",# 此处以 qwq-32b 为例,可按需更换模型名称 messages=messages, stream=True ) print("\n" + "=" * 20 + "思考过程" + "=" * 20 + "\n") for chunk in completion: # 如果chunk.choices为空,则打印usage if not chunk.choices: print("\nUsage:") print(chunk.usage) else: delta = chunk.choices[0].delta # 打印思考过程 if hasattr(delta, 'reasoning_content') and delta.reasoning_content != None: print(delta.reasoning_content, end='', flush=True) reasoning_content += delta.reasoning_content else: # 开始回复 if delta.content != "" and is_answering is False: print("\n" + "=" * 20 + "完整回复" + "=" * 20 + "\n") is_answering = True # 打印回复过程 print(delta.content, end='', flush=True) answer_content += delta.content messages.append({"role": "assistant", "content": answer_content}) print("\n") # print("=" * 20 + "完整思考过程" + "=" * 20 + "\n") # print(reasoning_content) # print("=" * 20 + "完整回复" + "=" * 20 + "\n") # print(answer_content)


02

使用工具



首先,定义模型调用函数

from openai import OpenAI # 设置 OpenAI 的 API 密钥和 API 基础 URL 使用 vLLM 的 API 服务器。openai_api_key = "EMPTY"openai_api_base = "http://localhost:8000/v1"MODEL = "path/to/QwQ-32B" client = OpenAI(    api_key=openai_api_key,    base_url=openai_api_base,)
def chat_completion_request(messages, tools=None, tool_choice=None, model=MODEL): try: response = client.chat.completions.create( model=model, messages=messages, tools=tools, tool_choice="auto", ) return response except Exception as e: print("Unable to generate ChatCompletion response") print(f"Exception: {e}") raise


然后,我们定义一些实用工具,用于调用聊天完成 API 以及维护和跟踪对话状态。

def pretty_print_conversation(messages):    role_to_color = {        "system": "red",        "user": "green",        "assistant": "blue",        "function": "magenta",    }        for message in messages:        if message["role"] == "system":            print(colored(f"system: {message['content']}\n", role_to_color[message["role"]]))        elif message["role"] == "user":            print(colored(f"user: {message['content']}\n", role_to_color[message["role"]]))        elif message["role"] == "assistant" and message.get("function_call"):            print(colored(f"assistant: {message['function_call']}\n", role_to_color[message["role"]]))        elif message["role"] == "assistant" and not message.get("function_call"):            print(colored(f"assistant: {message['content']}\n", role_to_color[message["role"]]))        elif message["role"] == "function":            print(colored(f"function ({message['name']}): {message['content']}\n", role_to_color[message["role"]]))


03

工具定义



这里假设了一个天气 API,并设置了一些函数规范和它进行交互。将这些函数规范传递给 Chat API,以便模型可以生成符合规范的函数参数。

tools = [    {        "type": "function",        "function": {            "name": "get_current_weather",            "description": "Get the current weather",            "parameters": {                "type": "object",                "properties": {                    "location": {                        "type": "string",                        "description": "The city and state, e.g. San Francisco, CA",                    },                    "format": {                        "type": "string",                        "enum": ["celsius", "fahrenheit"],                        "description": "The temperature unit to use. Infer this from the users location.",                    },                },                "required": ["location", "format"],            },        }    },    {        "type": "function",        "function": {            "name": "get_n_day_weather_forecast",            "description": "Get an N-day weather forecast",            "parameters": {                "type": "object",                "properties": {                    "location": {                        "type": "string",                        "description": "The city and state, e.g. San Francisco, CA",                    },                    "format": {                        "type": "string",                        "enum": ["celsius", "fahrenheit"],                        "description": "The temperature unit to use. Infer this from the users location.",                    },                    "num_days": {                        "type": "integer",                        "description": "The number of days to forecast",                    }                },                "required": ["location", "format", "num_days"]            },        }    },]


如果我们向模型询问当前的天气情况,它将会反问,希望获取到进一步的更多的参数信息。

messages = []messages.append({"role": "user", "content": "hi ,can you tell me what's the weather like today"})chat_response = chat_completion_request(    messages, tools=tools)print(chat_response)assistant_message = chat_response.choices[0].messagemessages.append(assistant_message)assistant_message


一旦我们通过对话提供缺失的参数信息,模型就会为我们生成适当的函数参数。

messages.append({"role": "user", "content": "I'm in Glasgow, Scotland."})chat_response = chat_completion_request(    messages, tools=tools)assistant_message = chat_response.choices[0].messagemessages.append(assistant_message)assistant_message


通过不同的提示词,我们可以让它反问不同的问题以获取函数参数信息。

messages = []messages.append({"role": "user", "content": "can you tell me, what is the weather going to be like in Glasgow, Scotland in next x days"})chat_response = chat_completion_request(    messages, tools=tools)assistant_message = chat_response.choices[0].messagemessages.append(assistant_message)assistant_message
messages.append({"role": "user", "content": "5 days"})chat_response = chat_completion_request(    messages, tools=tools)chat_response.choices[0]


并行函数调用

支持一次提问中,并行调用多次函数

messages = []messages.append({"role": "user", "content": "what is the weather going to be like in San Francisco and Glasgow over the next 4 days"})chat_response = chat_completion_request(    messages, tools=tools, model=MODEL)
assistant_message = chat_response.choices[0].message.tool_callsassistant_message


点击阅读原文 ,跳转模型合集




?点击关注ModelScope公众号获取
更多技术信息~


53AI,企业落地大模型首选服务商

产品:场景落地咨询+大模型应用平台+行业解决方案

承诺:免费场景POC验证,效果验证后签署服务协议。零风险落地应用大模型,已交付160+中大型企业

联系我们

售前咨询
186 6662 7370
预约演示
185 8882 0121

微信扫码

添加专属顾问

回到顶部

加载中...

扫码咨询