我要投稿

FastChat：一个大语言模型部署的利器

发布日期：2024-06-03 20:11:42 浏览次数： 4106

作者：大白爱爬山

微信搜一搜，关注“大白爱爬山”

FastChat是一个开源的大模型训练、推理工具。我们可以用它来部署Qwen、LLama、ChatGLM等各种常用模型；而且它还支持同时部署多个不同的大语言模型来提供推理服务，当然也可以部署多个相同的大语言模型从而可以简单的实现大模型分布服务。

地址：https://github.com/lm-sys/FastChat

安装

FastChat支持两种安装方式：

pip安装包
源码安装

pip安装包

pip3 install "fschat[model_worker,webui]"

如果需要安装所有功能，可能用下列安装命令：

pip3 install "fschat[all]"

源码安装

1. 首先下载源码，进入到根目录。

git clone https://github.com/lm-sys/FastChat.gitcd FastChat

2. 执行安装命令

pip3 install --upgrade pip# enable PEP 660 supportpip3 install -e ".[model_worker,webui]"

-e 表示可编辑，意思是当你修改源码后，会马上生效。

推理服务

FastChat支持三种推理方式：

命令行
API
Web GUI

命令行

FastChat可以通过命令行的方式部署大模型，同时还支持GPU、CPU、

XPU等设备。

单GPU部署命令：

python3 -m fastchat.serve.cli --model-path xxx

如果你需要实现多GPU部署，你可以通过--num-gpus参数指定数量：

python3 -m fastchat.serve.cli --model-path xxx --num-gpus 2

同时还可以通过参数--max-gpu-memory限制单个GPU使用的内存大小，

python3 -m fastchat.serve.cli --model-path xxx --num-gpus 2 --max-gpu-memory 8GiB

如果你的主机没有GPU，那么也可以通过CPU来部署：

python3 -m fastchat.serve.cli --model-path xxx --device cpu

对于其它设置的部署方式可以参考github的说明。

API

在日常中使用大模型一般都是通过API来访问，这种方式更加方便；Fastchat当然也支持API这种方式，部署步骤如下：

1. 启动controller服务。

python3 -m fastchat.serve.controller

2. 启动work服务。

部署work服务其实就是部署大模型，它支持 huggingface/transformers和vllm两种部署方式，transformers的方式推理速度可能会慢一点，所以我们一般采用vllm的方式来部署。

pip install vllm

python3 -m fastchat.serve.vllm_worker --model-path xxx

对于部署参数大一些的模型，单块GPU的内存可能不够，需要多个GPU，那么就需要指定--num-gpus参数，同时指定运行设备的环境变量：

CUDA_VISIBLE_DEVICES=0,1 python3 -m fastchat.serve.vllm_worker --model-path xxx --controller-address http://localhost:21001 --port 31000 --worker-address http://localhost:31000 --num-gpus 2

如果你需要在其它CUDA设备中部署从而实现分布式大模型推理服务，你可以继续部署一个大模型：

CUDA_VISIBLE_DEVICES=2,3 python3 -m fastchat.serve.vllm_worker --model-path xxx --controller-address http://localhost:21001 --port 31001 --worker-address http://localhost:31001 --num-gpus 2

甚至你还可以部署一个不同的大模型：

CUDA_VISIBLE_DEVICES=4,5 python3 -m fastchat.serve.vllm_worker --model-path yyy --controller-address http://localhost:21001 --port 31001 --worker-address http://localhost:31001 --num-gpus 2

3. 启动RESTful API服务。

python3 -m fastchat.serve.openai_api_server --host localhost --port 8000

这是一个openai风格的API，对外暴露的端口8000，这样就可以使用大模型的推理服务了，既可以使用openai官方的sdk，也可以直接通过http接口访问。

openai的方式：

import openai
openai.api_key = "EMPTY"openai.base_url = "http://localhost:8000/v1/"
model = "vicuna-7b-v1.5"prompt = "Once upon a time"
# create a completioncompletion = openai.completions.create(model=model, prompt=prompt, max_tokens=64)# print the completionprint(prompt + completion.choices[0].text)
# create a chat completioncompletion = openai.chat.completions.create(model=model,messages=[{"role": "user", "content": "Hello! What is your name?"}])# print the completionprint(completion.choices[0].message.content)

http接口：

curl http://localhost:8000/v1/chat/completions \-H "Content-Type: application/json" \-d '{"model": "vicuna-7b-v1.5","messages": [{"role": "user", "content": "Hello! What is your name?"}]}'

Web GUI

如果你需要通过web界面来访问大模型推理服务，Fastchat也是支持的，具体操作步骤如下：

1. 启动controller服务。

python3 -m fastchat.serve.controller

2. 启动work服务。

和API部署方式一样，这里采用vllm方式部署大模型。

pip install vllm

python3 -m fastchat.serve.vllm_worker --model-path xxx

3. 启动Gradio web服务。

python3 -m fastchat.serve.gradio_web_server

这样就可以通过界面访问大模型推理服务了。

53AI，企业落地大模型首选服务商

产品：场景落地咨询+大模型应用平台+行业解决方案

承诺：免费场景POC验证，效果验证后签署服务协议。零风险落地应用大模型，已交付160+中大型企业

相关资讯

2025-04-21

FastGPT 4.9.6：双向 MCP 重磅来袭

2025-02-15

我用FastGPT复刻了官方DeepSeek+联网~

2025-02-05

FastGPT 一招帮你解决 DeepSeek R1 的卡顿问题

2025-01-02

SiliconCloud x FastGPT：让20万用户打造专属AI知识库

2024-12-04

FastGPT 3分钟实现李继刚的“汉语新解”（保姆级教程）

2024-10-30

扔掉 Google 翻译！这个超强 AI 翻译工作流才是你的最佳选择

2024-09-12

FastGPT一站式解决方案[1-部署篇]：轻松实现RAG-智能问答系统

2024-09-07

FastGPT 正式接入 Flux，准备好迎接 AI 绘画的狂风了么？

了解更多

160+中大型企业正在使用53AI

立即咨询预约演示

把握AI发展的机遇，共同探索、共同进步

2025-01-22

如何打造基于GenAI的员工服务机器人

2025-01-22

热点资讯

FastGPT 4.9.6：双向 MCP 重磅来袭

2025-04-21

大家都在问

FastGPT 正式接入 Flux，准备好迎接 AI 绘画的狂风了么？

2024-09-07

热门标签

内容创作大模型技术个人提效 langchain llamaindex 多模态技术 RAG技术智能客服知识图谱模型微调 RAGFlow coze Dify Fastgpt Bisheng Qanything AI+汽车 AI+金融 AI+工业 AI+培训 AI+SaaS 提示词框架提示词技巧 AI+电商 AI面试数字员工 ChatBI 知识管理开源大模型智能营销智能硬件智能化改造 AI+医疗 MaxKB