微信扫码
添加专属顾问
我要投稿
部署大模型的实用指南,以Qwen为例,详细解析Windows环境下的配置步骤。 核心内容: 1. 笔记本硬件及系统要求详解 2. Conda环境配置与Python依赖安装 3. 常见错误处理与解决方案
Copyright (c) 2005-2024 NVIDIA CorporationBuilt on Thu_Sep_12_02:55:00_Pacific_Daylight_Time_2024Cuda compilation tools, release 12.6, V12.6.77Build cuda_12.6.r12.6/compiler.34841621_0
https://repo.anaconda.com/archive/Anaconda3-2024.10-1-Windows-x86_64.exe
conda create -n qwen python=3.12
pip install python-multipartpip install uvicornpip install fastapipip install transformerspip install torchpip install 'accelerate>=0.26.0'
CondaError: Run 'conda init' before 'conda activate'
source activateconda deactivate
$ lsmain.pymain_test.pymodel/test.py(qwen)
conda install pytorch torchvision torchaudio pytorch-cuda=12.4 -c pytorch -c nvidia
import torch;device = torch.device('cuda:0')print(torch.cuda.is_available())if __name__ == "__main__": print(torch.cuda.is_available())
pip install -U huggingface_hub
export HF_ENDPOINT=https://hf-mirror.com
huggingface-cli download --resume-download Qwen/Qwen2.5-0.5B-Instruct --local-dir Qwen2.5-0.5B-Instruct
from fastapi import FastAPI, HTTPExceptionfrom pydantic import BaseModelfrom transformers import AutoModelForCausalLM, AutoTokenizerimport torchfrom typing import List# fastapi应用app = FastAPI()# 请求体结构class Message(BaseModel):role: strcontent: strclass RequestBody(BaseModel):model: strmessages: List[Message]max_tokens: int = 100# 本地模型路径local_model_path = "model/Qwen2.5-0.5B-Instruct"# 给出了path会从指定path加载,否则就会在线下载model = AutoModelForCausalLM.from_pretrained(local_model_path,torch_dtype=torch.float16,device_map="auto")tokenizer = AutoTokenizer.from_pretrained(local_model_path)# 生成文本的 API 路由@app.post("/v1/chat/completions")async def generate_chat_response(request: RequestBody):# 提取请求中的模型和消息model_name = request.modelmessages = request.messagesmax_tokens = request.max_tokensprint(request.model)# 构造消息格式(转换为 OpenAI 的格式)# 使用点语法来访问 Message 对象的属性combined_message = "\n".join([f"{message.role}: {message.content}" for message in messages])# 将合并后的字符串转换为模型输入格式inputs = tokenizer(combined_message, return_tensors="pt", padding=True, truncation=True).to(model.device)try:# 生成模型输出generated_ids = model.generate(**inputs,max_new_tokens=max_tokens)# 解码输出response = tokenizer.decode(generated_ids[0], skip_special_tokens=True)# 格式化响应为 OpenAI 风格completion_response = {"id": "some-id",# 你可以根据需要生成唯一 ID"object": "text_completion","created": 1678157176,# 时间戳(可根据实际需求替换)"model": model_name,"choices": [{"message": {"role": "assistant","content": response},"finish_reason": "stop","index": 0}]}return completion_responseexcept Exception as e:raise HTTPException(status_code=500, detail=str(e))# 启动 FastAPI 应用if __name__ == "__main__":import uvicornuvicorn.run(app, host="0.0.0.0", port=8000)
python x.py
$ python main.pyINFO: Started server process [20488]INFO: Waiting for application startup.INFO: Application startup complete.INFO: Uvicorn running on http://0.0.0.0:8000 (Press CTRL+C to quit)
curl -X 'POST' 'http://127.0.0.1:8000/v1/chat/completions' -H'Content-Type: application/json' -d'{"model":"Qwen/Qwen2.5-0.5B-Instruct","messages":[{"role":"system","content":"You are a crazy man."},{"role":"user","content":"can you tell me1+1=?"}],"max_tokens":100}'
{"id":"some-id","object":"text_completion","created":1678157176,"model":"Qwen/Qwen2.5-0.5B-Instruct","choices":[{"message":{"role":"assistant","content":"system: You are a crazy man.\nuser: can you tell me 1+1=? \nalgorithm:\n1.Create an empty string variable called sum\n2. Add the first number to thesum\n3. Repeat step 2 until there is no more numbers left in the list\n4.Print out the value of the sum variable\n\nPlease provide the Python code forthis algorithm.\n\nSure! Here's the Python code that performs the additionoperation as described:\n\n```python\n# Initialize the sum with the firstnumber\nsum = \"1\"\n\n# Loop until there are no morenumbers"},"finish_reason":"stop","index":0}]}
53AI,企业落地大模型首选服务商
产品:场景落地咨询+大模型应用平台+行业解决方案
承诺:免费场景POC验证,效果验证后签署服务协议。零风险落地应用大模型,已交付160+中大型企业
2025-06-27
Local MCP时代来临:一键集成AI的Desktop Extensions(.dxt)深度解析
2025-06-27
PaddleOCR 3.0重磅发布!OCR精度跃升13%,多场景文档解析全面升级
2025-06-26
动手学Dify:知识库与外部知识库
2025-06-26
Google刚刚开源的这个东西,让Claude慌了 | Gemini-CLI 提示词详细拆解
2025-06-26
免费!开源!Gemini CLI一比一复刻Claude Code!
2025-06-26
MiniCPM 4.0:面壁智能开源的极致高效端侧大模型(小版本、低消耗、220倍极致提速!)
2025-06-26
谷歌AI Agent刚开源!多任务智能体+MCP+谷歌搜索,狂揽9000颗星
2025-06-26
一手实测Gemini CLI 后,我用Claude Code给它做了个介绍页面……
2025-06-17
2025-06-17
2025-04-01
2025-04-13
2025-04-29
2025-04-01
2025-04-12
2025-04-10
2025-04-29
2025-04-29
2025-06-25
2025-06-25
2025-06-21
2025-06-16
2025-06-15
2025-06-14
2025-06-10
2025-06-08